Laure Berti-Équille

Associate Professor

(Permanent position)

Université de Rennes 1
Campus Universitaire de Beaulieu
35042 Rennes cedex, France

Currently @ AT&T Labs Research

180 Park Avenue, room E217

Florham Park, NJ 07932, USA

Phone: +001 973 360 8407

Fax : +0033 2 99 84 71 71

Email: Laure.Berti-Equille @ irisa . fr


                                  
French Version 

                                                

 

 

 

 

 

 

 

 

*      Research Areas

*      Research Activities

*      Publications

*      Collaborations

*      Students

*      Teaching Activities

*      Former Research Groups

*      Biographical Sketch

 

 

 

 

From September 2007 to August 2009, I’m a visiting researcher at AT&T Labs Research, Florham Park (New Jersey) in the Database Group leaded by Divesh Srivastava.

My research project is supported by a Marie Curie OIF fellowship of the European Commission (Grant FP6-MOIF-CT-2006-041000).

 

 

Research Areas

*      Data quality in databases, quality metadata management

*      Quality-aware query processing

*      Quality-aware warehousing and data mining

*      Data cleaning

*      Quality-driven data integration

*      Recommender systems, adaptive information filtering

*      Multimedia mining and content-based information retrieval (CBIR)

*      Application domains: business intelligence, technological watch, biomedical data, CRM, E-learning

 

Research Activities

Data Quality in Multi-source information systems:

evaluation, monitoring, EXTENDED-QUERY Processing, and Quality-awareness for Data mining
 

The problem of poor data quality stored in database-backed information systems is widespread in the governmental, commercial and industrial environments. Alarming situations with various information quality problems can not be ignored anymore and theoretical as well as pragmatic approaches are urgently needed to be proposed and validated. As a consequence, information quality is now becoming one of the hot topics of emerging interest in the academic and industrial communities. Many processes and applications (such as information system integration, information retrieval, and knowledge discovery from databases) require various forms of data preparation or repair with several data processing techniques, because the data input to the application-dedicated algorithms is assumed to conform to nice data distributions, containing no missing, inconsistent or incorrect values. This leaves a large gap between the available dirty data and the available machinery to process the data for application purposes.

 

The objectives of my work are to propose theoretically founded solutions to evaluate and control data quality in multi-source information systems for structured, semi-structured and complex data under four approaches:

1) A preventive approach based on system-centred engineering for continuously controlling input data quality for very large databases and data warehouses

2) A diagnostic approach based on data mining techniques for the fast detection of anomalies (outliers, duplicates, inconsistencies) in very large data sets

3) A corrective approach with cost-based models to predict the cost of data corrections and optimally plan these operations

4) An adaptive approach based on a query language extension and optimization for the declaration and manipulation of quality-constrained data.

 

My research interests since 2000 are the following:

 

quality-aware query processing (2000 – present)

This main axis of my research work is focused on the definition and optimization of a quality-constrained query language allowing the declarative specification of data quality metrics and statistical constraints for measuring and controlling data quality.

 

Data quality awareness for knowledge discovery and data mining (2002 – present)

This axis of my research work is to propose a generic framework for integrating data quality indicators (quality metadata) into the KDD process and particularly for quality-aware association rule mining. A cost-based probabilistic model has been proposed for selecting legitimately interesting rules. Experiments on the challenging KDD-CUP-98 datasets show that variations on data quality have a great impact on the cost and quality of discovered association rules and confirm the idea that the management and analysis of quality metadata have to be integrated into the KDD process for ensuring the quality of mining results.

 

Quality-Aware Biological Data Warehousing and Cleaning (2000 – 2007)

Contributing to the design of several specialized biological databases of Génopole Ouest, LIMS (Laboratory Information Management Systems), and biomedical data warehouses that collect intensively data from various biomedical sources, my work in collaboration with Fouzia Moussouni (INSERM U522 – Associate Prof., University of Rennes 1) is focused on the acquisition, cleaning, and integration of biological semi-structured data with data quality monitoring.

 

Image Features mining for optimizing query-by-example processing (2002-2005)

Supervising the PhD thesis of Anicet Kouomou-Choupo (PhD 2002-2006) with Annie Morin (Associate Prof., HDR, University of Rennes 1), this work consisted of adapting data mining techniques and proposing a technique for improving the performance of image query-by-example execution strategies over multiple visual low-level MPEG-7 features. The work includes first, the pre-clustering of the large image database and then, the scheduling of the processing of the feature clusters before providing progressively the query results.

 

metadata management and multimedia quality of service (2002-2005)

This axis of my research work is tightly coupled with the first phase of ENTHRONE (FP6-2002-IST-2.3.1.8), European Integrated Project of the 6th European Framework Program. In ENTHRONE project - “End-to-End QoS through Integrated Management of Content, Networks and Terminals” - (December 2003 to March 2006) I was leading two tasks for “Content Creation Authoring Tool TVM Processor" and "Metadata Definition and Specification” as the coordinator for INRIA Rennes. In charge of the specification, generation, extraction and optimal selection of multimedia metadata (MPEG-21, MPEG-7, TV-AnyTime) for indexing, retrieving and personalizing multimedia contents, a uniform metadata model has been proposed and  M-Tool, a suite for metadata  authoring and MPEG-21 quality of service adaptation of audio visual contents and multimedia streams has been developed.

 

 

 

Collaborations

 

International Collaborations and Joint Research proJEcts

*     Research Exchange Program with Università di Roma La Sapienza – CNR (Italy) working with Monica Scannapieco (IStat) on the project entitled “CLINIQ: From Data Cleaning to Information Quality” (2006-2007).

 

contracts and Coordination of national research Projects

*     Contract with SEMA Group and DGA (General Direction of the French Army), Établissement des Constructions Navales of the French Navy (DGA/ECN/CTSN) (1997-1998)

*     Contract for research collaboration with the French Army Military Schools of Coëtquidan (France) and joint supervision of the PhD Thesis of LCL Jean-André Benvenuti (2003-2008) with Éric Jacopin (Associate Prof., HDR, Écoles Militaires de Saint-Cyr, Coëtquidan)

*     Contract with GenieLog (2006), Quality of Stream Data.

*     QUADRIS: Quality of Multi-source Data and Information Systems ANR-MMSA-05 ARA Masse de données, French Research Grant founded by ANR (2006-2009).

 

Participation to international research project

*     Phase 1 of the Integrated European Project of the Sixth European Framework Program: Enthrone (End-to-end QoS through Integrated Management of Content, Networks and Terminals) - FP6-2002-IST-2.3.1.8, Leadership of the tasks: WP4-task 4.4. "Content Creation authoring tool TVM processor" and WP2- task 2.3. "Metadata definition and specification", (December  2003 - March 2006).

 

 

PhD Students

Anicet Kouomou-Choupo (December 2002 - February 2006)
Improving Similarity Search in Very Large Image Database with Multimedia Mining Techniques

 

Jean-André Benvenuti (December 2002 - December 2008)

Intelligent Parsing, Indexing and Querying of XML Pedagogical Materials for Military Staff Training

 

 

Visiting Students and Interns

Ravi JAIN (February 2007-September 2007, Post-Doc INRIA)

     Quality-Awareness for Data Clustering

 

Yongluan Zhou (October 2005-March 2006, Internship INRIA)

Quality-Driven Distributed Query Planning and Optimization Based on Data Quality Negotiation

 

Manuel Bès (September 2001-June 2002, Internship)

Comparative Study of Association Rule Discovery Algorithms for Genomic Data Mining

 

Anne Charlery (September 2002–June 2003, Internship)

 Indexing Techniques for Genomic Data

 

Mehrez Chaikha-Douaihy (February 2005-June 2005, Internship)

Adapting and Optimizing Content-Based Image Retrieval

 

Wilfried Jouve (February 2005-August 2005, Internship)

Enriching Multimedia Content Description for Broadcast Environments: From a Unified Metadata Model to a New Generation of Authoring Tool Adapting and Optimizing Content-Based Image Retrieval

 

 

 

 

 

Teaching Activities

 

Since September 2000, I teach (lessons and practical work) at IFSIC (University of Rennes 1) in the following topics:

*     Database Management Systems

*     Data Warehousing and Data Mining

*     XML Technologies (XML, Xpath, XSL, Xquery, XML native systems)

*     Object-Oriented Software Engineering (UML, OCL, design patterns)

*     Software Engineering Project Management

*     Bioinformatics and Design of Biomedical Databases

*     Data Quality

 

 

Former Research Groups

TEXMEX project (INRIA Rennes) Techniques for the exploitation of very large volumes of multimedia data - (2002-2005)  

Multimedia metadata specification, extraction and exploitation
This axis of my research work was focused on specification, generation, extraction and optimal multi-criteria selection of relevant multimedia features and metadata for multimedia indexing and content-based information retrieval.

Multimedia Mining
This axis of my research work was focused on the adaptation of clustering techniques and algorithms of association rules discovery in order to propose the adaptive reorganization of the indexation schema and the adaptive query processing for very large image databases.

SYMBIOSE Project (INRIA Rennes) – Bioinformatics: Managing and Mining biological data - (2000-2002)

            BIological Data Management and MininG

My work in collaboration with Fouzia Moussouni (Associate Prof., Univ. Rennes 1) is focused on the acquisition, cleaning, and integration of biomedical data with data quality control techniques.

 

Indexing and retrieval of Genomic databases

This axis of my research work is focused on the proposition and test of algorithms for adaptive indexing of genomic (string) sequences.

 

LIA (University of Avignon, France) - (1999-2000)

Information Retrieval and Quality of Service for multimedia contents

 

SIS Research Group (University of Toulon, France) - (1996-1999)

sQuaL project

The sQuaL project aimed at developing a decisional information system able to manage multi-source data, textual data sources, and quality metadata and to propose quality-based recommendation strategies for personalizing information. The application domain of my work was technological watch and business intelligence for the French Navy (DGA/ECN/CTSN, Toulon, France)

 

QIRi@D Project

The objective of QIRi@D project was to combine the power of a query language and that of a search engine on semi-structured data and to develop a system using SgmlQL language for managing distributed SGML and XML documents. The principal aspects of the work were:

- enriching of the documents with quality metadata and collaborative annotations,

- indexing both the textual contents and the structure of the enriched SGML and XML documents,

- exploiting the documents: i.e., processing the distributed query on multiple sites, filtering and recommending personalized information depending on the document quality.

 

 

Biographical Sketch

Laure Berti-Équille received a M.Sc. degree in Computer Science in 1996 from the University of Paris IX (France). She earned the M.Sc. degree in Physics and the Ph.D. degree in Computer Science from the University of Toulon (France), in 1995 and 1999 respectively. From 1999 to 2000, Laure worked as an assistant professor at the University of Avignon (France). In September 2000, she joint IRISA Lab as a permanent associate professor of the University of Rennes 1 (France). Since September 2007, she is a visiting researcher at AT&T Labs Research, New Jersey (United States), where she is involved in several projects related to Data Quality Management and Mining (2-years fellowship funded by the European Commission - FP7 Mobility Marie Curie OIF).

 

 

Last Update : November 1st, 2008.