Federated Query Scaler
FederatedQueryScaler is an INRIA "Exploratory Research Project" (2017-2019). It is a collaboration between the teams DYLISS in Rennes and WIMMICS in Sophia-Antipolis. At the interface of data science and bioinformatics, RDF linked data provide an increasing quantity of structured data that can cross-reference across datasets. SPARQL offers a federated query mechanism for writing federated queries that span multiple datasets. However, although data are available in formats that make their integration technically feasible, the two main challenges are that:
- most users find it difficult to write SPARQL queries that fully leverage datasets interoperability,
- the lack of performance of such queries severly hinder their applicability.
FederatedQueryScaler aims at determing the optimal decomposition of federated SPARQL queries, which is critical to linked data scalability. Our approach will be based on determining an abstraction of each endpoint's data, that represents the classes of the dataset entities, and the relations between these classes. Abstractions are therefore compact representations of the dataset structure, that support scaling as they do not necessarily depend on the size of the dataset.
keywords: Semantic Web, SPARQL, federated queries, linked data, RDF fragments.
We are currently developing a tool that provides an intuitive graph-based visual interface that automatically compose a SPARQL query as the user navigates the dataset abstraction. Being able to compute automatically the abstraction of a remote dataset would therefore allow the user to combine compose a federated SPARQL query over multiple remote datasets. For addressing the challenge of executing such a query, it is necessary to revisit how SPARQL engines decompose federated queries and process the result. Linked data fragments are a recent breakthrough for improving performances. In this context, we will examine how the datasets abstractions can be combined with data fragments in CORESE/KGRAM and the STTL language developed at WIMMICS.
We are offering a 12 to 18 months postdoctoral position for developing the next generation of SPARQL query engines. The ideal profile would combine a good knowledge of the Semantic Web technologies (RDF and SPARQL) and programming skills.