FASTSYNC : Efficient clock synchronization in a datacenter

Submitted by Barbe MVONDO DJOB on
Team
Date of the beginning of the PhD (if already known)
Dès que possible
Place
Rennes
Laboratory
IRISA - UMR 6074
Description of the subject

Context

Clock synchronization is of utmost importance for applications running in a datacenter. It becomes critical for large scale distributed systems (e.g., databases--- RocksDB, Spanner, etc.), where each server epoch will determine the validity of a transaction. Additionally, clock synchronization is critical for fault tolerance protocols such as SMR (state machine replication) that relies on heartbeats to determine the health situation of a state. However, for servers in a datacenter, sources of clock inconsistency are numerous. 

The principal source of inconsistency originates from the network devices (mainly switches) that interconnect the different servers. Indeed, a switch performs several tasks and its load continuously vary. Thus, network packet processing can be delayed within a non-determinist interval and lead to the delay of network packets delivery, necessary to ensure clock synchronization. Another source of inconsistency is the kernel network stack, which introduces significant delays for user-space applications that need to process the network packet to synchronize. Despite existing techniques such as NTP (Network Time Protocol) or PTP (Precision Time Protocol), it is hard to achieve a determinist protocol that ensures micro-second level and fault-tolerant clock synchronization.
 

Objective

The main aim of the PhD is to propose a low-cost, micro-second level, determinist, and fault-tolerant clock synchronization protocol for servers in a datacenter. Our key insight is to use a different path than the network path to be free from loads on network devices. Thus, our starting point is the state-of-the-art of the different usable paths, such as Bluetooth. The output of the state-of-the-art phase will be the different parameters of each alternative path, such as the additional hardware required, the cost, and operating system support. Based on the state-of-the-art result, we will proceed to design a synchronization protocol relying on the alternative path. Then, we will implement a prototype of the protocol, with an emphasis on efficient kernel drivers to leverage the new hardware and alternate path. We intend to test the resulting prototypes on simulated and real datacenter testbeds.
 

Bibliography

[1] Marcos K. Aguilera and Naama Ben-David and Rachid Guerraoui and Virendra J. Marathe and Athanasios Xygkis and Igor Zablotchi: Microsecond Consensus for Microsecond Applications: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 599--616 

[2] Yuliang Li and Gautam Kumar and Hema Hariharan and Hassan Wassel and Peter Hochschild and Dave Platt and Simon Sabato and Minlan Yu and Nandita Dukkipati and Prashant Chandra and Amin Vahdat: Sundial: Fault-tolerant Clock Synchronization for Datacenters. 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 1171--1186 

[3] Yilong Geng and Shiyu Liu and Zi Yin and Ashish Naik and Balaji Prabhakar and Mendel Rosenblum and Amin Vahdat: Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization. 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 81--94

[4] Ki Suh Lee, Han Wang, Vishal Shrivastav, and Hakim Weatherspoon. Globally Synchronized Time via Datacenter Networks. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). 454–467. 

Researchers

Lastname, Firstname
Yerom-David Bromberg
Type of supervision
Director
Laboratory
UMR6074
Team

Lastname, Firstname
Djob Mvondo
Type of supervision
Supervisor (optional)
Laboratory
UMR6074
Team
Contact·s
Nom
Djob Mvondo
Email
barbe-thystere.mvondodjob@univ-rennes1.fr
Téléphone
0695396684
Keywords
System, Datacenter, Time synchronization protocols, Linux Kernel, Hardware