D7 - Data and knowledge management

Person in charge
Anne SIEGEL (Researcher director CNRS)
Description

DATA AND KNOWLEDGE MANAGEMENT (DKM - D7)

The focus of the DKM department is data modeling, data management and data mining through the exploitation of relations between data and knowledge. Our goal is therefore the explainable production of semantically-rich knowledge from complex (interdependent, unstructured, unbalanced) datasets provided by application domains (biology, environment or industry).

Our research is applied to interdependent, heterogeneous, incomplete and unbalanced datasets in the domains of molecular biology, environment, pharmacovigilance and health, open crowdsourcing campaigns, and large-scale exploitation systems. One of the characteristics of the department is therefore the important level of interdisciplinary collaborations: the department hosts, as associated members, researchers and engineers from INSERM, INRAE and l’institut Agro (former “Agrocampus”) and is involved in long-term applicative projects supported by the PIA (#DigitAg, Idealg, …).

Our strong connection with applications has evidenced that each domain-specific application is associated with constraints that have to be satisfied, although generic data mining methods may not be designed to satisfy such constraints. A common focus of the department is therefore to guarantee the accuracy and validity of the results of the methods we develop. To that goal, we provide reliability indicators, in order to give to the users the opportunity to elucidate how and why data have been analyzed:

  • Explainability is made feasible by providing schemes of explanations or algorithms based on formal structures (patterns, formal concepts, logical programs, grammars…).
  • Exhaustivity and/or representativity are obtained by reporting all the solutions to the data management tasks we consider, and by taking into account the intrinsic characteristics of datasets (uncertainty, rare events).

To achieve that goal, our originality is to enrich data analysis and management methods with knowledge-based and reasoning-based approaches. Our strategy is to take advantage of Semantic Web technologies and dedicated data structures in order to rely on the a priori knowledge on the expert domain and structure, homogeneize and then facilitate the data exploration.

  • DRUID (Databases, privacy, belief functions), provides models and algorithms for the management of uncertain, user-generated, interlinked data, including privacy issues.
  • DYLISS (Bioinformatics, Semantic Web, automated reasoning, systems biology) develops automated reasoning and querying languages for the representation and integration of heterogeneous data in life sciences.
  • GENSCALE (Bioinformatics, data structures, sequence algorithms) develops efficient data structures and algorithms for the analysis of large-scale genomic data.
  • LACODAM (Data-mining, machine learning) develops data mining and machine learning approaches for decision-support and knowledge production. Most approaches proposed are contributing to the field of interpretable AI.
  • SemLIS (Semantic Web, data mining, Natural Language Processing) develops symbolic methods for knowledge extraction and acquisition, and for user-centered interactive exploration and querying of knowledge bases.
  • SHAMAN (Databases, automated reasoning, knowledge representation, Semantic Web, privacy) investigates the use of symbolic Artificial Intelligence, in particular automated reasoning, to design flexible, cooperative and quality-aware knowledge-based data management systems.