You are here

Numerical Rule Mining for Prediction of Wheat and Vine Diseases

Team and supervisors
Department / Team: 
Team Web Site:
PhD Director
Alexandre Termier
Co-director(s), co-supervisor(s)
Luis Galárraga
NameEmail addressPhone Number
Luis Galárraga
PhD subject

It is a national priority to reduce the amount of phytosanitary products used for the treatment of diseases in plantations. A controlled use of such products carries both economical and environmental benefits, and can be achieved by means of monitoring procedures and tools for plant health. Since 2009, the Bulletin de Santé du Végetal (BSV) contributes to deliver information at the regional scale in order to help farmers make decisions about the usage of phytosanitary products. The production of the bulletins relies on a network of disease surveillance and monitoring. Large volumes of data have been collected thanks to this network over a million of data points for wheat plantations, however little has been done to exploit the data, in particular when it comes to make predictions about the incidence of diseases in plantations. Up to now, some statistical studies on specific questions about diseases of wheat and vine have been conducted. Nonetheless, the power of data mining techniques at discovering new practical knowledge remains yet unexplored in this domain.

This thesis aims at applying data mining techniques on disease monitoring data in order to predict the incidence of plant diseases in wheat plantations (data from Arvalis) and vineyards (data from Epicure - IFV). Data mining techniques can discover interesting patterns and correlations in data. Such patterns provide explicit explanations that help scientists understand a domain of knowledge and make predictions about it. One example in the domain of vineyard diseases is the rule:  If the grapevine has weak branches and the weather is rainy, the risk of grape canker is high. Traditionally, data mining techniques operate mostly on qualitative attributes. In our example, all the attributes involved in the rule are qualitative (symptom = "weak branches", weather = "rainy", risk="high"). In contrast, plant disease dynamics can be also explained by numerical attributes. One example is the rule: If the disease is excoriose, then the percentual incidence in the vineyard is a linear function of the precipitation and the temperature. Such a rule combines qualitative data (disease="excoriose") with mathematical functions on the numerical attributes. While numerical data has been traditionally explained via regression models, such methods are unable to account for the also valuable categorical information such as the type of disease. In that regard, numerical rule mining reconciliates traditional data mining with numerical models and can deal with heterogenous datasets rich in categorical and numerical attributes.

  1. R. Agrawal, R. Srikant, et al.,. “Fast Algorithms for Mining Association Rules". In Proceedings of the 20th International Conference on Very Large Databases, VLDB, vol. 1215, pp. 487-499, 1994.
  2. M. J. Zaki. “Generating Non-redundant Association Rules". In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 34-43, ACM, 2000.
  3. L. A. Galárraga, C. Teioudi, K. Hose, and F. Suchanek. “AMIE: Association Rule Mining Under Incomplete Evidence in Ontological Knowledge Bases". In Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013.
  4. J. Wang, J. Han, Y. Lu, and P. Tzvetkov, “TFP: An Eficient Algorithm for Mining top-k Frequent Closed Itemsets”. IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 5, pp. 652-663, 2005.
  5. J. Han, J. Pei, and Y. Yin. “Mining Frequent Patterns without Candidate Generation”. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD '00, (New York, NY, USA), pp. 1-12, ACM, 2000.
  6. H. Li, J. Li, L. Wong, M. Feng, and Y.-P. Tan. “Relative risk and odds ratio: A data mining perspective”. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 368-377, ACM, 2005.
  7. D. Duvenaud, J. Lloyd, R. Grosse, J. Tenenbaum, and G. Zoubin. “Structure Discovery in Nonparametric Regression through Compositional Kernel Search”. In Proceedings of the 30th International Conference on Machine Learning, pp. 1166-1174, PMLR, 17-19 Jun 2013.
  8. L. Michel. “Mieux valoriser les reseaux d'epidemiosurveillance lors de l'elaboration du Bulletin de Sante du Vegetal”. PhD thesis, Institut des Sciences et Industries du Vivant et de l'Environnement (AgroParisTech), 2016.
  9. A. Karalic and I. Bratko. “First Order Regression," Machine Learning, vol. 26, no. 2-3, pp. 147-176, 1997.
  10. N. Fanizzi, C. d'Amato, F. Esposito, and P. Minervini. “Numeric prediction on OWL knowledge bases through terminological regression trees”. International Journal of Semantic Computing, vol. 6, no. 04, pp. 429-446, 2012.
  11. G. Dong and V. Taslimitehrani. “Pattern-Aided Regression Modeling and Prediction Model Analysis". IEEE Transactions on Knowledge and Data Engineering, vol. 27, pp. 2452-2465, Sept. 2015.
  12. W. Duivesteijn, A. J. Feelders, and A. Knobbe. “Exceptional Model Mining: Supervised Descriptive Local Pattern Mining with Complex Target Concepts”. Data Mining and Knowledge Discovery, vol. 30, pp. 47-98, Jan. 2016.
numerical rule mining, plant disease, vineyard, wheat, data mining
IRISA - Campus universitaire de Beaulieu, Rennes