Vous êtes ici

Efficient Algorithms for Large-Scale Sketched Acoustic Learning

Equipe et encadrants
Département / Equipe: 
Site Web Equipe: 
Directeur de thèse
Rémi Gribonval
Co-directeur(s), co-encadrant(s)
Antoine Deleforge
Sujet de thèse
  • 1. Context

Computational Auditory Scene Analysis (CASA) consists in analyzing audio recordings in order to infer information about the captured auditory scene. Examples include: finding the location of speakers, musical instruments, or other sound sources in a room; extracting the individual sound produced by a source of interest from the mixture; verifying/determining the identity/nature of each sound source; transcribing the content of each source (speech to text, or music to score), etc. CASA is rooted both in signal processing and machine learning. It exploits tools such as: Wiener filtering for denoising and source separation; Gaussian Mixture Modeling (GMM) or Non-Negative Matrix Factorization (NMF) to model the spectral variability of speech or music signals and mixtures; independent component analysis (ICA) and Gaussian Locally Linear Mapping (GLLiM) [1] for sound source localization; or Bayesian networks, to infer music structure at various time scales within a recording, or within collections of recordings.

The increasing availability of very large training audiovisual collections or streams, coupled with the high dimensionality of audiovisual features, opens many new opportunities and computational challenges for CASA. To leverage the opportunities offered by the unprecedented information content of such distributed data streams, as well as of large collections of several billion items, while controlling the required computational resources, there is a need to develop radically new techniques and concepts for large-scale machine learning.

The PANAMA team has developed a new learning framework called sketched learning. Rooted in compressive sensing [2], the approach leverages recent machine learning techniques such as random features [3]. Its principle is to compress the training collection (or the data stream) before even starting to learn, by computing a so-called sketch, which is the empirical average of some well chosen (often random) nonlinear function of the training samples. The ability to sketch while preserving the information needed to learn has been demonstrated, both theoretically and with empirical algorithms gathered in a toolbox [4], on learning problems such as GMM training [5], k-means clustering [6,7], or Principal Component Analysis (PCA) [7].

As a proof of concept, sketching was successfully demonstrated on speaker verification, to learn a so-called “world model” on a training corpus of several thousand hours of speech. Where the classical EM approach requires multiple passes on the training collection, sketching was shown to lead to comparable speaker verification performance while allowing drastic dimension-reduction abilities: the whole collection (of several hundred million items) was summarized into a single vector, the sketch, of only a few kilobytes.

This goal of this thesis is to demonstrate the potential of sketched learning for a wide range of tasks involved in large-scale computational auditory scene analysis.

  • 2. Objectives

The main objectives of the thesis are:

  • to design flexible sketching techniques able to address the main learning tasks involved in CASA;
  • to address the computational bottlenecks of existing and proposed sketched learning techniques;

Standard massive real and simulated datasets for large-scale machine learning [10, 11, 12] will serve as a testbed for the algorithms resulting from the thesis.

The thesis is expected to lead to the development and consolidation of a robust know-how for parameterizing the proposed techniques, and to the empirical demonstration of their ability to provide privacy-aware efficient learning [13].

Flexible sketching. The potential of sketched learning has already been demonstrated on GMM, clustering, and PCA. This opens new research opportunities, calling to revisit machine learning tasks such as unsupervised clustering, supervised clustering, regression, dictionary learning, spectral clustering, etc. A first objective in the thesis is to demonstrate this potential by proposing new sketching techniques for a selection of learning tasks relevant to CASA.

As a priority we will extend sketched learning to address selected problems expressed in terms of matrix factorization (NMF, ICA and sparse dictionary learning) and high-dimensional regression (GLLiM). An envisioned approach is to replace shift invariant-random Fourier features (used for skeched GMM and sketched clustering) with random rectified linear unit (ReLU) – a nonlinearity which has been popularized in Deep Neural Networks (DNN)– since it leads to the needed rotation invariance. This is expected to consolidate existing connections between sketching and DNNs. A more ambitious goal will be to propose new multiscale “hierarchical” sketching techniques able to efficiently address multi-layer inference in Bayesian networks.

Scalable learning. Sketching is a highly promising avenue to design new scalable learning algorithms: once the sketch is computed, the cost of learning no longer depends on the size of the collection, leading to immediate computational benefits in the large-scale setting (both in terms of memory and computational resources). Besides, computing the sketch can intrinsically be performed in a distributed or online fashion.

To fully benefit from the scalability of sketched learning, we ideally target an implementation of the “learn from sketch” stage with cost (sub)linear in the complexity of the task. Yet, current implementations based on random Gaussian projection matrices still mainly scale quadratically with the ambient dimension. To address this computational bottleneck, an envisioned approach will combine: the design of sketching mechanisms involving randomized fast transforms [8, 9] rather than random Gaussian matrices, which is expected to speed up both the sketching stage and the learning stage; the implementation of efficient aggregation mechanisms for the sketching stage through the use of modern infrastructures for distributed computing (Hadoop, Spark, ...); the analysis of the computational bottlenecks of the optimization algorithms used at the learning stage, to improve their resource-efficiency and scalability, e.g. using stochastic gradient methods.

Privacy-preservation has been identified as a strong side-effect of sketched learning, since sketches consist of aggregated information from all items of the training collection / stream, in stark contrast with traditional learning mechanisms where full access to each item of the training collection is given to a potentially adversarial learner. We expect to demonstrate empirically that sketched learning remains efficient when combined with standard privacy mechanisms such as the addition of “Laplacian noise” [13] to the sketches of the individual elements.

  1. [1]  Antoine Deleforge, Florence Forbes, and Radu Horaud. High-dimensional regression with gaussian mixtures and partially-latent response variables. Statistics and Computing, 25(5):893–911, 2015.

    [2]  Simon Foucart and Holger Rauhut. A Mathematical Introduction to Compressive Sensing. Springer, May 2012.

  2. [3]  Ali Rahimi and Benjamin Recht. Random Features for Large Scale Kernel Machines. Advances in Neural Information Processing Systems (NIPS), (1):1–8, 2007.

  3. [4]  Keriven, Nicolas and Tremblay, Nicolas and Gribonval, Rémi. SketchMLBox: a toolbox for sketched machine learning, 2016.

  4. [5]  Nicolas Keriven, Anthony Bourrier, Rémi Gribonval, and Patrick Perez. Sketching for Large-Scale Learning of Mixture Models. In IEEE International Conference on Acoustic, Speech and Signal Pro- cessing (ICASSP), 2016.

  5. [6]  Nicolas Keriven, Nicolas Tremblay, Yann Traonmilin, and Rémi Gribonval. Compressive K-means. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, United States, March 2017.

  6. [7]  Rémi Gribonval, Gilles Blanchard, Nicolas Keriven, and Yann Traonmilin. Random Moments for Sketched Statistical Learning. Submitted, October 2016.

  7. [8]  Krzysztof Choromanski and Vikas Sindhwani. Recycling Randomness with Structure for Sublinear time Kernel Expansions. May 2016.

  8. [9]  Luc Le Magoarou and Rémi Gribonval. Flexible multi-layer sparse approximations of matrices and applications. IEEE J. Selected Topics in Signal Processing, 2016.

  9. [10]  Wei Dong, Richard Socher, Li Li-Jia, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. 2009.

  10. [11]  Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The million song dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011), 2011.

  11. [12]  Clément Gaultier, Saurabh Kataria, and Antoine Deleforge. VAST : The Virtual Acoustic Space Traveler Dataset. In International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Grenoble, France, February 2017.

  12. [13]  John C Duchi, Michael I Jordan, and Martin J Wainwright. Privacy Aware Learning. October 2012.

Début des travaux: 
Fall 2017
Mots clés: 
Large-scale learning; computational auditory scene analysis; sketching; machine learning; signal processing
IRISA - Campus universitaire de Beaulieu, Rennes