You are here

Towards probabilistic decentralized systematic design for large-scale privacy-preserving collaborative systems

Team and supervisors
Department / Team: 
Team Web Site:
PhD Director
François Taïani
Co-director(s), co-supervisor(s)
NameEmail addressPhone Number
François Taïani
+ 33 2 99 84 75 04
PhD subject


We increasingly rely on on-line systems for a growing range of activities and services, including social media, file sharing, collaborative platforms, or messaging. These services generate enormous amounts of user-related data, from personal photos to movie preferences. In order to cope with such large data volumes, most of these services are provided by large companies, and hosted on-line in powerful cloud platforms.

Unfortunately, this overwhelming reliance on the cloud comes at a strong price: users lose control over their own data, which is distributed over multiple data centers, often in foreign jurisdictions, making it difficult for most users to grasp, review, and assess the privacy, and confidentiality guarantees they can expect. In practice, user data hosted in the cloud is routinely harvested, often without explicit consent, either by cloud providers or third parties, a sad situation that is only partially mitigated by legal measures such as the GDPR.

Although data encryption is often proposed as a first line of defense for cloud-hosted data, it is unfortunately insufficient [7, 16, 9]. Metadata such as a user’ location and activity can still be tracked, including access control histories, thus revealing what each user shared with whom and when. The exposure of metadata seriously weakens user privacy, as famously emphasized by a former NSA and CIA director: “We kill people based on metadata.” [12]
Constructing a system that provides a high level of privacy protection while offering a rich range of services at scale is unfortunately highly challenging. The last two decades have witnessed many attempts at building privacy-preserving and privacy protection P2P applications [3, 1, 4, 11, 14]. These solutions often leverage solutions such as onion routing [2], logical overlays, or replication with some improvements to perform node discovery in a distributed manner. Unfortunately, they often present a monolithic and hard-to-reuse design, built at times on unproven mechanisms under unclear assumptions.

PhD Topic

Objective and Research Challenges

The objective of this PhD thesis is to explore the design, analysis, and implementation of modular probabilistic decentralized mechanisms for the systematic construction of large-scale Byzantine Fault-Tolerant Privacy Preserving distributed collaborative systems. We argue that large-scale decentralized collaborative systems face three core intertwined challenges if they are to scale, resist to attack while remaining open to new contributors, and protect the privacy of their users.

  1. 1. The must solve the tension between scale and coordination: strong coordination guarantees, such as those found in consensus protocols, or blockchain systems, can be extremely costly to enforce (be it in terms of overall performance, or of computing power). A number of works, including recent works on distributed ledgers [8], have shown that such strong guarantees are in fact no needed for a range of services and applications. One open question is how the range of known weaker coordination models (e.g. weaker consistency models) can be systematically adapted to an aggressive fault models including Byzantine nodes, and implemented using efficient algorithms while adopting a modular and principled approach.
  2. 2. Large-scale decentralized collaborative systems must arbitrate between their robustness to Byzantine behavior and their openness: Ideally such systems should be open (i.e. permission- less) in order to easily integrate new contributors. Unfortunately tolerating Byzantine behavior in such a setting is extremely difficult, as such open systems can be subject to large-scale Sybil attacks, and known deterministic Byzantine protocols usually to not work with unknown or dynamic participants. Current solutions to solve this dilemma either use some form of proof of work (PoW) such as Bitcoin [10] or Proof of Stake (PoS) such as Algorand [6], but both strategies tend to self-defeat the general philosophy of an open decentralized systems. A fascinating question is whether early proposals to use metastable [15] mechanisms to solve this challenge can be generalized to any kind of collaborative system.
  3. 3. Finally, these systems must protect their user’s privacy while insuring some form of collaboration, and hence allowing for some leak of information. A highly popular approach to privacy protection in information sharing systems use onion routing, a technique popularized for instance by the Tor network. Practical systems that rely on onion routing tend to construct static routes that are no necessarily adapted to highly dynamic and open membership. They also tend to limit themselves to communication services, and cannot immediately be combined with richer distributed paradigms such as commutative replicated data types (CRDTs) or, more generally, weakly consistent distributed data structures.

Envisaged Approach

We propose a three-pronged strategy to address the above challenges, which we detail below. We are particularly interested in exploring potential synergies between solutions in the three dimensions we have outlined, and we expect progress in one area to naturally lead to new questions and avenues in the two others.

  • To better approach the tension between scale and coordination in a Byzantine Fault model, we plan to develop an improved theoretical understanding of consistency conditions under Byzantine Faults. A large number of consistency conditions [13] have been proposed to describe the behaviour of a replicated data structure (i.e. a data structure whose interface is made available on many nodes) but these conditions usually assume that nodes are correct (or might possibly crash). In this PhD, we would like to study how these conditions can be automatically extended to the Byzantine case in order to lay the theoretical foundations for the realization of Byzantine tolerant replicated systems with weaker conditions than traditional state machine replication approach.
  • To contribute to reconcile Byzantine fault-tolerance with open and dynamic system beyond the limitation of today’s blockchain techniques, we would like to investigate how a small-scale permissioned distributed algorithm that is tolerant to Byzantine fault-tolerance could be automatically transformed into an algorithm able to execute efficiently on a large-scale permission-less system with equivalent probabilistic correctness guarantees. Our intuition is that current early proposals that exploit epidemic metastable phenomena [15] as well as unbiased sampling to test the likelihood of specific global predicates [8, 5] can be generalized to a much larger family of step-based algorithms. These techniques however assume a perfect sampling ser- vice, that is immune to Byzantine faults, which is in practice hard to achieve. To better understand this aspect, we would also like to investigate the robustness of this transformation and its related techniques to imperfect and possibly partially corruptible sampling services.
  • Finally, in order to reconcile privacy and collaboration in large scale decentralized systems, we envisage to explore new techniques of probabilistic onion-routing in which an onion route no longer consists of a simple sequence of relays but is constructed of a sequence of recursively encrypted relaying scripts, thus extending our early works in this area. Our idea is that each script would encode a set of dynamically computed possible relays, based on the trust, and availability of currently on-line peers, and of the targeted reliability of the realized route. We also would like to explore how onion-routing techniques can be integrated more intuitively into collaborative objects by investigating their use withing weakly consistent distributed datastructures to protect parts of the datastructure’s state against unintended disclosure to malicious or even only curious users.

[1] Krista Bennett, Christian Grothoff, Tzvetan Horozov, and Ioana Patrascu. Efficient sharing of encrypted data. In Lynn Batten and Jennifer Seberry, editors, Information Security and Privacy, pages 107–120. Springer Berlin Heidelberg, 2002.

[2] David L. Chaum. Untraceable electronic mail, return addresses, and digital pseudonyms. Commun. ACM, 24(2):84–90, 2 1981.

[3] Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A distributed anony- mous information storage and retrieval system. In International Workshop on Designing Privacy Enhancing Technologies: Design Issues in Anonymity and Unobservability, pages 46–66. Springer- Verlag New York, Inc., 2001.

[4] Michael J. Freedman and Robert Morris. Tarzan: A peer-to-peer anonymizing network layer. In Proceedings of the 9th ACM Conference on Computer and Communications Security, CCS ’02, 2002.

[5] Davide Frey, Achour Mostefaoui, Matthieu Perrin, Pierre-Louis Roman, and Fran ̧cois Ta ̈ıani. Speed for the elite, consistency for the masses: differentiating eventual consistency in large-scale distributed systems. In 35th Symposium on Reliable Distributed Systems (SRDS 2016), September 2016.

[6] Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai Zeldovich. Algorand: Scaling byzantine agreements for cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 51–68. ACM, 2017.

[7] Jennifer Stisa Granick. We Kill People Based on Metadata, page 53–66. Cambridge University Press, 2017.

[8] Rachid Guerraoui, Petr Kuznetsov, Matteo Monti, Matej Pavlovic, and Dragos-Adrian Seredinschi. AT2: asynchronous trustworthy transfers. CoRR, abs/1812.10844, 2018.

[9] Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Side channels in cloud services: Dedu- plication in cloud storage. IEEE Security & Privacy, 8(6):40–47, 2010.

[10] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008.

[11] Arjun Nambiar and Matthew Wright. Salsa: A structured approach to large-scale anonymity. In Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS ’06, pages 17–26, 2006.

[12] John Naughton. Death by drone strike, dished out by algorithm. The Guardian, 2 2016.

[13] Matthieu Perrin. Sp ́ecification des objets partag ́es dans les syst`emes r ́epartis sans-attente. (Specifi- cation of shared objects in wait-free distributed systems). PhD thesis, University of Nantes, France, 2016.

[14] Marc Rennhard and Bernhard Plattner. Introducing morphmix: Peer-to-peer based anonymous internet usage with collusion detection. In Proceedings of the 2002 ACM Workshop on Privacy in the Electronic Society, WPES ’02, 2002.

[15] Team Rocket. Snowflake to avalanche: A novel metastable consensus protocol family for cryptocur- rencies, 2018.

[16] Jelle van den Hooff, David Lazar, Matei Zaharia, and Nickolai Zeldovich. Vuvuzela: scalable private messaging resistant to traffic analysis. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4-7, 2015, pages 137–152, 2015.

Work start date: 
Decentralization, Privacy Protection, Distributed Algorithms, Byzantine Failures, Consis- tency, Scalability
IRISA - Campus universitaire de Beaulieu, Rennes