DRUID

Head of team
Zoltan MIKLOS (Enseignant-chercheur, Université Rennes 1)

DRUID : Declarative & Reliable management of Uncertain, user-generated Interlinked Data

Recently,  there  is  an  increased  interest for data management methods.   Statistical  machine learning techniques, empowered by the available pay-as-you-go distributed computing power,  are able to extract useful information from certain data.  The international press, being specialized or not, has echoed these remarkable results as a new Spring for Artificial Intelligence in a broad sense.  The data is sometimes even referred as the “gold of the 21st century”.  In any areas of business and science, one tries to construct huge datasets to be able to profit from the benefits of the Artificial intelligence revolution.

However,  when datasets contain personal data,  their collection and usage may lead to undesirable practices.  In particular, there is a growing interest in privacy, mirroring the still growing interest in analytics over personal data.  Machine Learning and Privacy can indeed be seen as two sides of the same coin:  machine learning tries to extract relevant information from data, while privacy tends to blur information in order to hide identifying or sensitive individual information.  In addition to the protection of the personal data input by machine learning algorithms, guaranteed by privacy models and privacy-preserving algorithms, the fairness of the output is critical for mitigating discrimination issues within automatic or “semi-automatic” high-stake decisions about individuals (e.g. laws, social rights, police).

Unfortunately, both these desirable needs – seamless machine learning and privacy – are not supported elegantly for now in the data management dogma.  For example, Machine Learning operators are seen for now as external procedures outside the query language, barely accounted by the optimizer.  Moreover, the knowledge extraction tasks are hard to design without understanding the available data, thus one should consider  knowledge  extraction  as  an  interactive  process, where users  influence  the  process.   Privacy-preserving algorithms often make  an extensive  use  of  cryptography, incurring  prohibitive costs  when  considering  typical  volumes in  data management use-cases.  Additionally, the choice of a privacy model and of its parameters, among a large number of possible models, is barely understandable for non-expert database administrators. Finally,
privacy and fairness are usually considered apart without analyzing their mutual impacts.

These observations lay the ground for the goals of the DRUID team:

  • Propose mechanisms to better integrate Machine Learning methods with the database logic and engines
  • Propose interactive, human-in-the-loop data analysis and knowledge extraction methods even with uncertain data
  • Make  privacy-preserving  techniques  meet  real-life  constraints within  data-centered systems, with a special focus on performance and intelligibility
  • Design data-centered systems that are both private and fair.
Creation date
26/09/2014
Reporting institution
Université de Rennes 1
Location
Rennes (35) et Lannion (22)
Activity reports
Attachment Size
druid2018_1.pdf 612.69 KB
druid2017_0.pdf 884.84 KB