Task: Underdetermined-speech and music mixtures Participants: Alexey Ozerov and Emmanuel Vincent Data processed: All mixtures, except new 3-ch mixtures (test3.zip) Algorithm description: Particular instances of general source separation framework described in [1]. These instances are implemented exacly as described in [1] (Section 5.B), except a difference in initialization (as described below), where 8 different framework instances (configurations) were compared (see Table IV of [1]) on the developement dataset. Here, we repeated exactly the same experiments as in [1] (see Table IV), and for each particular condition (defined by mixing type, source type (speech, music) and microphone spacing) selected for submission the best (on the developement data) of 8 tested configurations. Since in a subset test2 of test dataset there are mixtures with 4cm and 20cm microphone spacings, while in the developement dataset there are no mixture with such spacings, for such mixtures (i.e., with 4cm and 20cm microphone spacings) we have chosen configurations performing the best on, respectively, mixtures with 5cm and 1m microphone spacings. Note also that, in difference to [1], for synthetic convolutive and live recorded mixtures we first estimated the time differences of arrival (TDOAs) using the GCC-NONLIN algorithm with MAX pooling function, as described in [2]. The following configurations were retained for different corresponding conditions (see Table IV of [1] for specification of configurations and conditions) 1. Condition -> Mixing: instantaneous, Sources: speech Retained configuration (Configuration 7): Rank-1, constrained spectral structure, constrained temporal structure 2. Condition -> Mixing: instantaneous, Sources: music Retained configuration (Configuration 5): Rank-1, unconstrained spectral structure, constrained temporal structure 3. Condition -> Mixing: synthetic convolutive, Sources: speech, Microphone spacing: 5 cm (or 4 cm) Retained configuration (Configuration 7): Rank-1, constrained spectral structure, constrained temporal structure 4. Condition -> Mixing: synthetic convolutive, Sources: speech, Microphone spacing: 1 m (or 20 cm) Retained configuration (Configuration 8): Rank-2, constrained spectral structure, constrained temporal structure 5. Condition -> Mixing: synthetic convolutive, Sources: music, Microphone spacing: 5 cm (or 4 cm) Retained configuration (Configuration 2): Rank-2, unconstrained spectral structure, unconstrained temporal structure 6. Condition -> Mixing: synthetic convolutive, Sources: music, Microphone spacing: 1 m (or 20 cm) Retained configuration (Configuration 3): Rank-1, constrained spectral structure, unconstrained temporal structure 7. Condition -> Mixing: live recorded, Sources: speech, Microphone spacing: 5 cm Retained configuration (Configuration 7): Rank-1, constrained spectral structure, constrained temporal structure 8. Condition -> Mixing: live recorded, Sources: speech, Microphone spacing: 1 m Retained configuration (Configuration 8): Rank-2, constrained spectral structure, constrained temporal structure 9. Condition -> Mixing: live recorded, Sources: music, Microphone spacing: 5 cm Retained configuration: (Configuration 7) Rank-1, constrained spectral structure, constrained temporal structure 10. Condition -> Mixing: live recorded, Sources: music, Microphone spacing: 1 m Retained configuration: (Configuration 2) Rank-2, unconstrained spectral structure, unconstrained temporal structure Average running time: Our Matlab implementation on 2.2 GHz CPU, runs 9 minutes to process 10 seconds expert References: [1] A. Ozerov, E. Vincent and F. Bimbot, "A general flexible framework for the handling of prior information in audio source separation," IEEE Trans. on Audio, Speech and Lang. Proc. (to appear) [2] C. Blandin, A. Ozerov and E. Vincent, "Multi-source TDOA estimation in reverberant audio using angular spectra and clustering," Signal Processing, special issue on "Latent Variable Analysis and Signal Separation" (to appear)