Mine once, Update sometimes : Energy-Efficient Rule Mining on Knowledge Graphs via Provenance

Publié le mar 24/01/2023 - 18:41

Equipe

LACODAM

Site web de l'équipe

/equipes/lacodam

Date de début de thèse (si connue)

Septembre

Lieu

Rennes

Unité de recherche

IRISA - UMR 6074

Description du sujet de la thèse

Multiple AI-assisted tasks rely on the ability to learn patterns from knowledge graphs (KGs). Those patterns can take the form of logical rules such as birthCountry(x, y) => nationality(x, y) (people are usually citizens of the country they were born in), which are automatically discovered or mined from KGs using resource-intensive algorithms. Those algorithms assume that KGs are static. This, however, stands in sharp contrast with reality: modern KGs are continuously updated to be in sync with an ever-changing world where countries change governments, wars break out, and new animal species are discovered. This dynamicity can turn previously learned rules obsolete. If one or several countries abolish birthright citizenship, then the confidence of our example rule may decrease, and so our reliance on it. So far, the traditional way to handle data updates when mining KGs is to rerun the mining algorithm on the latest version of the KG. This can be prohibitively resource-intensive for large and dynamic KGs. We therefore envision a mechanism to mine the rules incrementally so that the computational cost of data updates is proportional to the size of the changes -- and not to the size of the data. We call this philosophy mine once, update sometimes.

Envisioned Solution and Research Questions

Incremental rule mining is an instance of update propagation for query results in databases. This is so because pattern mining relies on executing many queries on the data. For instance, the support of our example rule (which determines partly how much we should trust the rule) is the number of results of query Q(x, y) := birthCountry(x, y), nationality(x, y). If the KG changes, the support of this query may also change. To update the query’s solution count without rerunning the query from scratch, the literature in databases has proposed to propagate updates via how-provenance explanations. These are expressions that convey a trace of the interactions between the KG records that contribute to a given query solution. These explanations can be used as proofs to verify if a solution is still a solution after some changes in the data. Based on this observation, we propose to carry out rule mining once, store the mined rules, and then propagate subsequent updates from the KG to the different scores computed on the rules – in the spirit of the philosophy mine once, update sometimes. This thesis is therefore built upon two main research questions: (a) What is a good representation for the KG, the rules, and the provenance explanations in an incremental mining setting?; (b) how much time and energy consumption can be saved with a mine-once-update-sometimes approach to rule mining on KGs?.

Bibliographie

Boris Glavic. Data Provenance. Foundations and Trends in Databases, 9(3-4) :209–441, 2021.
Luis Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. Fast Rule Mining in Ontological Knowledge Bases with AMIE+. VLDB Journal, 24(6), 2015.
Garima Gaur, Arnab Bhattacharya, and Srikanta Bedathur. How and Why is An Answer (Still) Correct ? Maintaining Provenance in Dynamic Knowledge Graphs. International Conference on Knowledge Management, 2020.
Daniel Hernández, Luis Galárraga, and Katja Hose. Computing How-Provenance forSPARQL Queries via Query Rewriting. Proc. VLDB Endow., 14(13) :3389–3401, 2021.

Liste des encadrants et encadrantes de thèse

Galárraga Luis

Type d'encadrement

Co-encadrant.e

Unité de recherche

IRISA

Département

D7 - Gestion des données et de la connaissance

Equipe

LACODAM

Termier Alexandre

Type d'encadrement

Directeur.trice de thèse

Unité de recherche

IRISA

Département

D7 - Gestion des données et de la connaissance

Equipe

LACODAM

Contact·s

Nom

Galárraga Luis

luis.galarraga@inria.fr

Téléphone

0299847129

Mots-clés

knowledge graphs, provenance, rule mining, energy efficient, incremental