Vous êtes ici

Making Data Science more accessible by bridging Data Mining and Artificial Intelligence

Equipe et encadrants
Département / Equipe: 
Site Web Equipe: 
Directeur de thèse
Alexandre Termier
Co-directeur(s), co-encadrant(s)
Torsten Schaub
Thomas Guyet
Sujet de thèse

The tremendous and always increasing volumes of data collected nowadays hold the keys to new discoveries, both in industrial and scientific domains. Such discoveries will be made possible thanks to “Data Science”, the domain concerned with making sense from data. Nowadays, the main problem of Data Science is that despite considerable improvements, it is still a mostly “manual” process: the current data analysis tools (e.g. RapidMiner or Knime) require a human to select the relevant data processing / data mining operators, to set their parameters, and to connect them in a data analysis workflow. This makes data analysis a lengthy, partial and error-prone process.
There are some efforts to automate parts of the data science process. Some approaches use a mix of Artificial Intelligence planning techniques and meta-learning [NHK14], while others are based on optimization techniques [BBBK11]. Both families focus on the classification task, and in practice are limited to simple workflows with one or two operators. While this is already helpful, there are many more data science tasks, and actual workflows can have dozens of operators.
In this PhD, we are interested in the challenging setting of exploratory data science: the analyst cannot provide a precise goal to the system. This is the case in many data mining situations, and relies on tools such as pattern mining. In such cases, traditional planning techniques cannot operate, as planning decisions may depend on intermediary analysis of the data or analyst interactions. Similar situations arise in robotics, where plans may need to be updated according to sensory input. This has led to the design of continual planning techniques [BN09], where the steps of Planning, Execution and Monitoring are integrated, and where the plan is constantly updated according to newly acquired knowledge.
The objective of the PhD is to propose novel Data Science techniques that start from a dataset and loosely specified goals, formulates some hypothesis about the data, exploit techniques from cognitive robotics [LL08], like continual planning, to progressively built analysis workflows for checking these hypothesis, and present the results of the most promising ones to the analyst. We also want to integrate a meta-learning component, i.e. exploit the past performance of the system as well as human-made workflows, in order to improve over time, and allow adaptation to certain domains or certain types of analysts (e.g. expert or non-expert).
The framework chosen for this work is Answer Set Programming [GKK + 11], a modern logic programming language that is increasingly used in cognitive robotics, due to its capacity to integrate planning, preferences, and domain knowledge in a simple, declarative way [Lif02, ARSS15].

The candidate should be interested in Data Mining and Artificial Intelligence.
Experience in constraint and/or logic programming (ex: ASP, CP, Prolog) is a plus, but is not mandatory.


[ARSS15] Benjamin Andres, David Rajaratnam, Orkunt Sabuncu, and Torsten Schaub. Integrating ASP into ROS for reasoning in robots. In Francesco Calimeri, Giovambattista Ianni, and Miroslaw Truszczynski, editors, Logic Programming and Nonmonotonic Reasoning - 13th International Conference, LPNMR 2015, Lexington, KY, USA, September 27-30, 2015. Proceedings, volume 9345 of Lecture Notes in Computer Science, pages 69–82. Springer, 2015.

[BBBK11] James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl.
Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain., pages 2546–2554, 2011.

[BN09] Michael Brenner and Bernhard Nebel. Continual planning and acting in dynamic multiagent environments. Autonomous Agents and Multi-Agent Systems, 19(3):297–331, 2009.

[GKK + 11] Martin Gebser, Benjamin Kaufmann, Roland Kaminski, Max Ostrowski, Torsten Schaub, and Marius Thomas Schneider. Potassco: The potsdam answer set solving collection. AI Commun., 24(2):107–124, 2011.

[Lif02] Vladimir Lifschitz. Answer set programming and plan generation.
Artif. Intell., 138(1-2):39–54, 2002.

[LL08] Hector J. Levesque and Gerhard Lakemeyer. Cognitive robotics. In Frank van Harmelen, Vladimir Lifschitz, and Bruce W. Porter, editors, Handbook of Knowledge Representation, volume 3 of Foundations of Artificial Intelligence, pages 869–886. Elsevier, 2008.

[NHK14] P. Nguyen, Melanie Hilario, and Alexandros Kalousis. Using meta-mining to support data mining workflow planning and optimization. J. Artif. Intell. Res. (JAIR), 51:605–644, 2014.

Début des travaux: 
Mots clés: 
Data mining, Artificial Intelligence, Planning, Answer set programming
IRISA - Campus universitaire de Beaulieu, Rennes