KERDATA : Scalable Storage for Clouds and Beyond
KerData is a joint project-team of Inria, ENS Rennes and INSA Rennes and is part of the IRISA lab.
The KerData project-team is exploring innovative software architectures and systems for extreme-scale distributed data storage and processing. Our target underlying infrastructures range from extreme-scale supercomputers and clouds to distributed fog infrastructures and edge devices. Recently, we started to focus on exploring challenges raised by the emergence of an increasing number of scenarios in many areas (energy, personalised medicine, autonomous vehicles, digital twin-based manufacturing, etc.) that use hybrid combinations of these HPC/cloud/fog/edge infrastructures (forming what is now called the Digital Continuum, the Computing Continuum or the Transcontinuum). Our research follows three main directions.
Research axis 1: Convergence of HPC and Big Data at the level of data storage and processing
As the tools and cultures of High Performance Computing and Big Data Analytics have evolved in divergent ways, there is an increasing need for these areas to converge, as Big Computations still generate Big Data and Big Data need Big Computations for efficient analysis. At KerData we focus on achieving a key convergence milestone consisting in defining and validating common abstractions and techniques for data storage and processing in support of complex workflows combining simulations and analytics running on hybrid HPC/cloud infrastructures. In particular, we investigate how blob storage systems could serve as a basis to for such a converged storage abstraction. Preliminary efforts in this direction led to the Tyr storage system (Best Student Paper Award Finalist at SC16). Regarding the data processing level, we focus on a major challenge from the perspective of the HPC-BDA convergence: the design of a unified architecture enabling a joint use of in situ processing and in-transit processing (from the HPC area) with stream processing (from the Big Data analytics area). We are approaching this challenge by combining ongoing approaches currently active in our team: Damaris (for in situ processing) and KerA (for optimized stream processing).
Research axis 2: Efficient Edge, Cloud and hybrid Edge/Cloud data processing
The explosion of data generated from the Internet of Things (IoT) and the need for real-time analytics resulted in a shift of the data processing paradigms from centralised clouds towards decentralized and multi-tier computing infrastructures and services (edge computing). Our research aims to revisit current cloud storage and processing techniques to cope with the volatile requirements of newly emerging scenarios for data-intensive applications running on hybrid cloud/fog/edge systems at large scale, with a particular focus on streaming. In addition, we investigate new experimental methodologies and supporting software platforms enabling a complete analysis cycle of the execution of Digital Continuum applications, from deployment, configuration, experimentation, results collection and analysis, as a means to investigate the trade-offs related to the use of hybrid cloud/fog/edge infrastructures in terms of performance, resource usage or cost.
Research axis 3: Supporting AI across the digital continuum
Leveraging the Big Data phenomenon, artificial intelligence (and more specifically machine learning and deep learning) recently gained momentum as they became privileged means to gain insights from Big Data. This may require to integrate and process high-frequency data streams from multiple sensors scattered over a large area in a timely manner. For instance, an earthquake detection and warning system can use machine learning to detect earthquake and classify their magnitudes using data coming from numerous distributed sensors. In collaboration with experts in machine learning experts and domain experts, we explore innovative design alternatives for distributed data processing architectures that leverage the edge/cloud digital continuum to support high-precision machine-learning-based analytics. A promising joint result in this directions applied to early earthquake warning obtained the Outstanding Paper Award for Social Impact at the AAAI-20 conference.