Task: Underdetermined-speech and music mixtures

Participants: Alexey Ozerov and Emmanuel Vincent

Data processed: All mixtures, except new 3-ch mixtures (test3.zip)

Algorithm description:
Particular instances of general source separation framework described in [1].
These instances are implemented exacly as described in [1] (Section 5.B),
except a difference in initialization (as described below),
where 8 different framework instances (configurations) were compared 
(see Table IV of [1]) on the developement dataset.
Here, we repeated exactly the same experiments as in [1] (see Table IV), 
and for each particular condition (defined by mixing type, 
source type (speech, music) and microphone spacing)
selected for submission the best (on the developement data) of 8 tested 
configurations.
Since in a subset test2 of test dataset there are mixtures with 4cm
and 20cm microphone spacings, while in the developement dataset there
are no mixture with such spacings, for such mixtures (i.e., with 4cm
and 20cm microphone spacings) we have chosen configurations performing
the best on, respectively, mixtures with 5cm and 1m microphone
spacings.  Note also that, in difference to [1], for synthetic
convolutive and live recorded mixtures we first estimated the time
differences of arrival (TDOAs) using the GCC-NONLIN algorithm with MAX
pooling function, as described in [2].

The following configurations were retained for different corresponding 
conditions (see Table IV of [1]
for specification of configurations and conditions)

1.  Condition -> Mixing: instantaneous, Sources: speech
     Retained configuration (Configuration 7): Rank-1, constrained 
spectral structure, constrained temporal structure

2.  Condition -> Mixing: instantaneous, Sources: music
     Retained configuration (Configuration 5): Rank-1, unconstrained 
spectral structure, constrained temporal structure

3.  Condition -> Mixing: synthetic convolutive, Sources: speech, 
Microphone spacing: 5 cm (or 4 cm)
     Retained configuration (Configuration 7): Rank-1, constrained 
spectral structure, constrained temporal structure

4.  Condition -> Mixing: synthetic convolutive, Sources: speech, 
Microphone spacing: 1 m (or 20 cm)
     Retained configuration (Configuration 8): Rank-2, constrained 
spectral structure, constrained temporal structure

5.  Condition -> Mixing: synthetic convolutive, Sources: music, 
Microphone spacing: 5 cm (or 4 cm)
     Retained configuration (Configuration 2): Rank-2, unconstrained 
spectral structure, unconstrained temporal structure

6.  Condition -> Mixing: synthetic convolutive, Sources: music, 
Microphone spacing: 1 m (or 20 cm)
     Retained configuration (Configuration 3): Rank-1, constrained 
spectral structure, unconstrained temporal structure

7.  Condition -> Mixing: live recorded, Sources: speech, Microphone 
spacing: 5 cm
     Retained configuration (Configuration 7): Rank-1, constrained 
spectral structure, constrained temporal structure

8.  Condition -> Mixing: live recorded, Sources: speech, Microphone 
spacing: 1 m
     Retained configuration (Configuration 8): Rank-2, constrained 
spectral structure, constrained temporal structure

9.  Condition -> Mixing: live recorded, Sources: music, Microphone 
spacing: 5 cm
     Retained configuration: (Configuration 7) Rank-1, constrained 
spectral structure, constrained temporal structure

10. Condition -> Mixing: live recorded, Sources: music, Microphone 
spacing: 1 m
     Retained configuration: (Configuration 2) Rank-2, unconstrained 
spectral structure, unconstrained temporal structure


Average running time: Our Matlab implementation on 2.2 GHz CPU, runs 9 
minutes to process 10 seconds expert


References:

[1] A. Ozerov, E. Vincent and F. Bimbot, "A general flexible framework 
for the handling of prior information in audio source separation,"  IEEE 
Trans. on Audio, Speech and Lang. Proc. (to appear)

[2] C. Blandin, A. Ozerov and E. Vincent, "Multi-source TDOA estimation 
in reverberant audio using angular spectra and clustering,"  Signal 
Processing, special issue on "Latent Variable Analysis and Signal 
Separation" (to appear)