Efficient geo-distributed data stream processing

Publié le
Equipe
Date de début de thèse (si connue)
Dès que possible
Lieu
Rennes
Unité de recherche
IRISA - UMR 6074
Description du sujet de la thèse

The infrastructures which host most of the large Internet applications are becoming increasingly distributed. To deliver excellent performance to their users while reducing the usage of wide-area networking resources, fog computing extends the traditional cloud computing model with additional resources located close the the end-user devices [1].

Fog platforms have very different geographical distribution compared to traditional clouds. Classical datacenter clouds are composed of many reliable and powerful machines located in a very small number of data centers and interconnected by very high-speed networks. In contrast, fogs are composed of a very large number of points-of-presence with a couple of weak and potentially unreliable servers, interconnected with each other by commodity long-distance networks.

Data stream processing is an attractive paradigm for analyzing IoT data in the fog before transmitting processed results to the cloud [2]. Stream processing engines allow programmers to express applications as a workflow of data transformations (operators) which execute over unbounded data streams. Workflows are organized as a directed acyclic graph where vertices represent operators and edges represent data streams. However, current data stream processing platforms such as Apache Flink [3] were not designed to operate in geo-distributed environments such as fog computing platforms, and parts of their implementation creates suboptimal performance in this context.

The objective of this thesis is to propose alternative implementations to deliver maximum processing efficiency in a fog computing environment. This will require the PhD student to identify the main sources of inefficiency and to propose alternative techniques. For example, stream processing engines introduce a variety of stateless or stateful operators to transform one or more input streams in one or more output streams. When the operator state cannot be easily partitioned between multiple replicas of the operator, it becomes necessary to replicate the state and to maintain its consistency every time this state is updated. This requires costly communications in case the replicas were geographically distributed. A possible approach to address this issue would be to exploit so-called "conflict-free replicated data types" [4] to better control the tradeoff between inter-replica synchronization and computation accuracy.

This project will be conducted within the IRISA Myriads team which is working on the design of innovative infrastructures and middleware for future fog computing platforms [5]. The team leader, Guillaume Pierre, is also the coordinator of the FogGuru European project [6].

Required qualifications:

  • A master degree in distributed systems and/or Cloud computing.
  • Excellent programming skills in Linux environments.
  • Excellent communication and writing skills.
  • Good command of English.
  • Knowledge of the following technologies is not mandatory but will be considered as a plus:
    • Cloud resource scheduling
    • Distributed container systems: Kubernetes, Docker Swarm.
    • Single-board computers such as Raspberry PI
    • Python and shell scripting
    • Revision control systems: git, svn.
    • Linux distributions: Debian, Ubuntu.

Note that knowledge of French is *not* required for this position.

Contract duration: 3 years, full time.

Start date: October 2021.

Location: Rennes is the capital city of Britanny, in the western part of France. It is easy to reach thanks to the high-speed train line to Paris. Rennes is a lively city and a major center for higher education and research. The job will take place within the INRIA/IRISA research center, which is internationally recognized for its research in the domain of information and communication sciences.

Bibliographie

[1] Fog Computing and its Role in the Internet of things. F. Bonomi et al., In Proc. ACM MCC, 2012. https://conferences.sigcomm.org/sigcomm/2012/paper/mcc/p13.pdf

[2] An Experiment-Driven Performance Model of Stream Processing Operators in Fog Computing Environments. Hamidreza Arkian, Guillaume Pierre, Johan Tordsson and Erik Elmroth. In Proceedings of the ACM/SIGAPP Symposium on Applied Computing (SAC), April 2020. https://hal.inria.fr/hal-02394396

[3] Apache Flink. https://flink.apache.org/

[4] Conflict-free replicated data type. https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type

[5] Myriads INRIA/IRISA team. https://team.inria.fr/myriads/

[6] FogGuru: Training the Next Generation of European Fog Computing Experts. H2020 ITN EID Marie Skłodowska-Curie project #765452. http://www.fogguru.eu/

 

Liste des encadrants et encadrantes de thèse

Nom, Prénom
Guillaume Pierre
Type d'encadrement
Directeur.trice de thèse
Unité de recherche
UMR 6074
Contact·s
Mots-clés
fog computing, cloud computing, geo-distribution, data stream processing