Privacy-Preserving Decentralized Learning Through Model Fragmentation and Private Aggregation

Publié le
Equipe
Date de début de thèse (si connue)
octobre 2023
Lieu
Rennes
Unité de recherche
IRISA - UMR 6074
Description du sujet de la thèse

Machine learning consists in producing (learning) a computer-based function (usually referred to as a model) from examples (training data). The accuracy and quality of the resulting model are usually directly related to the size of the training data, but training from very large datasets raises at least two problems. First, very large training sets require substantial computing power to train the model in a reasonable time. Second, as machine learning is increasingly applied to sensitive and personal data (e.g. health records, personal messages, user preferences, browsing histories), exposing this data to the learning algorithm raises far-reaching privacy-protection concerns and carries important risks of privacy violation.

These two problems have prompted the emergence of a range of distributed learning techniques, which seek to distribute the learning effort on many machines to scale the learning process and limit privacy leaks by keeping sensitive data on the learning devices. Two related strategies have, in particular, emerged to address these challenges: Federated Learning, initially promoted by Google, and Decentralized Learning, which forgoes entirely any centralized entity in the learning process. Unfortunately, recent works have shown that, in spite of their promises, both of these approaches can be subject to privacy attacks, such as membership inference, data reconstruction, or attribute inference, that make it possible for malicious participants to access private and or sensitive information through the learning process.

This PhD aims to improve the privacy protection granted by decentralized learning by exploring how model fragmentation, a technique developed by the WIDE team within the ANR Pamela project (2016-2020), can be combined with private aggregation and random peer sampling, two of the strategies successfully applied to P2P networks.

More concretely, the PhD will address the three following research questions:

  1. How to fragment, distribute, and reconstruct decentralized models?
  2. How to combine fragmentation and privacy-preserving averaging without disrupting learning?
  3. How to characterize the gains of protection obtained from fragmentation and privacy-preserving averaging?

In terms of methodology, the PhD will combine the design, implementation, and characterization of research prototypes to gain practical insights, coupled with algorithmic and statistical models on which we will carry out systematical analysis and reasoning.

Bibliographie

[1] J. Konecny, H. B. McMahan, D. Ramage, and P. Richt ́arik. "Federated Optimization: Distributed Machine Learning for On-Device Intelligence". In: CoRR abs/1610.02527 (2016). arXiv: 1610. 02527.
[2] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar. "Federated multi-task learning". In: NIPS. 2017, pp. 4424-4434.
[3] F. Chen, Z. Dong, Z. Li, and X. He. "Federated Meta-Learning for Recommendation". In: arXiv preprint arXiv:1802.07876 (2018).
[4] V. Zantedeschi, A. Bellet, and M. Tommasi. "Fully Decentralized Joint Learning of Personalized Models and Collaboration Graphs". In: AISTATS. Vol. 108. Proceedings of Machine Learning Research. PMLR, 2020, pp. 864-874.
[5] E. Cyffers and A. Bellet. "Privacy Amplification by Decentralization". In: AISTATS. Vol. 151. Proceedings of Machine Learning Research. PMLR, 2022, pp. 5334-5353.
[6] I. Colin, A. Bellet, J. Salmon, and S. Clémençon. "Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions". In: ICML. Vol. 48. JMLR Workshop and Conference Proceedings. JMLR.org, 2016, pp. 1388-1396.
[7] X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and J. Liu. "Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent". In: NIPS. 2017, pp. 5330-5340.
[8] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konecny, S. Mazzocchi, H. B. McMahan, et al. "Towards Federated Learning at Scale: System Design". In: Proceedings of the 2nd SysML Conference (2019).
[9] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth. "Practical secure aggregation for privacy-preserving machine learning". In: CCS. ACM. 2017, pp. 1175-1191.
[10] B. Zhao, K. R. Mopuri, and H. Bilen. "iDLG: Improved Deep Leakage from Gradients". In: CoRR abs/2001.02610 (2020). arXiv: 2001.02610.
[11] Z. Wang, M. Song, Z. Zhang, Y. Song, Q. Wang, and H. Qi. "Beyond inferring class representatives: User-level privacy leakage from federated learning". In: INFOCOM. IEEE. 2019, pp. 2512-2520.
[12] L. Zhu, Z. Liu, and S. Han. "Deep Leakage from Gradients". In: NeurIPS. 2019, pp. 14747-14756.
[13] J. Geiping, H. Bauermeister, H. Dröge, and M. Moeller. "Inverting Gradients - How easy is it to
break privacy in federated learning?" In: (2020).
[14] M. Nasr, R. Shokri, and A. Houmansadr. "Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning". en. In: 2019 IEEE Symposium on Security and Privacy (SP). arXiv:1812.00910 [cs, stat]. May 2019, pp. 739-753.
[15] D. Pasquini, M. Raynal, and C. Troncoso. "On the Privacy of Decentralized Machine Learning". In: CoRR abs/2205.08443 (2022). arXiv: 2205.08443.
[16] I. Driouich, C. Xu, G. Neglia, F. Giroire, and E. Thomas. Local Model Reconstruction Attacks in Federated Learning and their Uses. en. arXiv:2210.16205 [cs]. Oct. 2022.
[17] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. "Membership Inference Attacks Against Machine Learning Models". In: 2017 IEEE Symposium on Security and Privacy, S&P. 2017, pp. 3- 18.
[18] A. Bouchra Pilet, D. Frey, and F. Taïani. "Simple, Efficient and Convenient Decentralized Multi-Task Learning for Neural Networks". In: IDA 2021 - 19th Symposium on Intelligent Data Analysis. Vol. 12695. Lecture Notes in Computer Science. Porto, Portugal: Springer, Apr. 2021.
[19] A. B. Pilet, D. Frey, and F. Taïani. "Robust Privacy-Preserving Gossip Averaging". In: SSS. Vol. 11914. Lecture Notes in Computer Science. 2019, pp. 38-52.
[20] M. Jelasity, A. Montresor, and O. Babaoglu. "Gossip-Based Aggregation in Large Dynamic Networks". In: ACM Transactions on Computer Systems 23.3 (2005).
[21] M. Jelasity, S. Voulgaris, R. Guerraoui, A.-M. Kermarrec, and M. van Steen. "Gossip-based Peer Sampling". In: TOCS 25.3 (2007).
[22] A. B. Pilet, D. Frey, and F. Taïani. "Foiling Sybils with HAPS in Permissionless Systems: An Address-based Peer Sampling Service". In: ISCC. IEEE, 2020, pp. 1-6.
[23] D. Frey, R. Guerraoui, A.-M. Kermarrec, A. Rault, F. Taïani, and J. Wang. "Hide & Share: Landmark-Based Similarity for Private KNN Computation". In: DSN. 2015, pp. 263-274.
[24] E. Bortnikov, M. Gurevich, I. Keidar, G. Kliot, and A. Shraer. "Brahms: Byzantine resilient random membership sampling". In: Comput. Networks 53.13 (2009), pp. 2340-2359.
[25] J.-C. Fabre, Y. Deswarte, and B. Randell. "Designing Secure and Reliable Applications using Fragmentation-Redundancy-Scattering: An Object-Oriented Approach". In: EDCC. Vol. 852. Lecture Notes in Computer Science. 1994, pp. 21-38.

Liste des encadrants et encadrantes de thèse

Nom, Prénom
Taïani, François
Type d'encadrement
Co-encadrant.e
Unité de recherche
IRISA
Equipe

Nom, Prénom
Frey, Davide
Type d'encadrement
Directeur.trice de thèse
Unité de recherche
IRISA
Equipe

Nom, Prénom
Gaudel, Romaric
Type d'encadrement
2e co-directeur.trice (facultatif)
Unité de recherche
IRISA
Equipe
Contact·s
Nom
Taïani, François
Email
francois.taiani@irisa.fr
Téléphone
+33 (0) 2 99 84 75 04
Nom
Frey, Davide
Email
davide.frey@inria.fr
Téléphone
+33 (0) 2 99 84 75 65
Nom
Gaudel, Romaric
Email
romaric.gaudel@irisa.fr
Téléphone
+33 (0) 2 99 84 72 34
Mots-clés
Machine Learning, Decentralization, Privacy Protection, Middleware, Randomized Algorithms