Abstract: The thesis focuses on compressive learning, a paradigm for large-scale machine learning in which the whole dataset is compressed down to a single vector of randomized generalized moments, called the sketch. An approximate solution of the learning task at hand is then estimated from this sketch, without using the initial data. This framework is by nature suited for learning from distributed collections or data streams, and has already been instantiated with success on several unsupervised learning tasks such as k-means clustering, density fitting using Gaussian mixture models, or principal component analysis. We improve this framework in multiple directions. First, it is shown that perturbing the sketch with additive noise is sufficient to derive (differential) privacy guarantees. Sharp bounds on the noise level required to obtain a given privacy level are provided, and the proposed method is shown empirically to compare favourably with state-of-the-art techniques. Then, the compression scheme is modified to leverage structured random matrices, which reduce the computational cost of the framework and make it possible to learn on high-dimensional data. (Other contributions proposed in the context of the thesis which will not be covered during the defense include the design of a new algorithm based on message passing techniques to learn from the sketch for the k-means clustering problem, and some considerations relative to the design of the sketching operator, opening the way for an extension of the framework to new learning tasks.)
Antoine Chatalic, doctorant de l'équipe PANAMA soutiendra sa thèse le Jeudi 19 Novembre 2020 à 14h00. La soutenance sera entièrement dématérialisée.
Voici le lien pour participer à la soutenance. Nous vous demanderons de couper vos micros et éventuellement la caméra.