Intranet
You are here: Home Irisa Job Positions Polyphonic sound description for music information retrieval
Document Actions

PostDoc Employment Offer

Polyphonic sound description for music information retrieval


Subject description



Post doctorant activity description

The success of online music stores and radios (such as ITunes and LastFM) has led to a quest for smart content distribution services. The development of advanced music information retrieval algorithms would be a major breakthrough in this context, allowing for instance accurate tagging, style classification or retrieval by similarity of audio clips.

Most music information retrieval algorithms rely on global low-level features, such as Mel-Frequency Cepstral Coefficients (MFCCs) or Pitch Class Profiles (PCPs), which model all instruments together. These algorithms are known to exhibit limited performance for a range of classification tasks. A few authors have shown that much higher performance can be achieved by extracting higher-level features for each instrument separately. Yet, attempts to derive such features from polyphonic music signals have failed, due to the low accuracy of source separation and polyphonic music transcription systems. This low accuracy cannot be overcome due to the inherent uncertainty about individual instrument sounds when several instrument sounds mask each other, e.g. when a strong drum sound masks a weaker sound or when a strong pitched note masks a weaker note at an octave interval.

The purpose of this project is to extract features describing individual instruments or sets of instruments from a musical audio signal, without attempting to perfectly separate or transcribe them. Instead, a suitable probabilistic framework will be seeked so as to express the uncertainty about each instrument signal and propagate it through the feature extraction stage, thus making the features more robust to source separation or polyphonic music transcription errors. This will also make it possible to express the uncertainty about the features themselves and disambiguate it at a later stage using higher-level information or feedback from classification. Candidate tools to be used in this context include but are not limited to: harmonic models, GMMs, NMF, BSS, importance sampling and alternative approximate Bayesian inference techniques.

The proposed framework will be primarily applied and validated for the description and the classification of musical audio within large databases. Target instruments to be separately characterized in this context include for instance singing voice, bass and drums. This will naturally lead to more accurate music classification capabilities, such as composer characterization, singer identification and retrieval by rhythmic similarity. More generally, this research will have a huge impact on the music and audio information retrieval community by providing a new general paradigm for research that extends the current paradigm based on deterministic features.

Duration and funding:

This post-doctoral position is provided for one year and renewable for a second year. It will be funded by the QUAERO project (www.quaero.org) and will involve close collaboration with academic and industrial partners.


Required skills and abilities

Prospective candidates should have a background in signal processing, machine learning, applied statistics or pattern recognition. Additional expertise in the fields of audio and music is welcome. Proficient programming in Matlab is necessary. Practice of C would be an asset.

To apply, send a CV and a motivation letter to emmanuel.vincent@irisa.fr and frederic.bimbot@irisa.fr (do not use the "reply" button below).




Reply this position offer

 

Project / Team

METISS

Priority topic

Post-doctorant

Contact

Mr Emmanuel VINCENT
INRIA
Campus Universitaire de Beaulieu
35042 Rennes Cedex


Legal informations and credits