Alexey Ozerov IRISA /INRIA - Rennes alexey.ozerov@irisa.fr A. Ozerov, F. Nesta, and E. Vincent ALGORITHM: 1. 200 iterations of Generalized Expectation-Maximization algorithm from [1] for joint estimation of flexible model. The following particular models were used: a. for speech sources: harmonic NMF (for spectral power) / full-rank spatial covariances (see [1] for details) b. for music sources: NMF with K = 4 (for spectral power) / full-rank spatial covariances (see [1] for details) 2. The sources are recovered via Wiener filtering, as described in [1], given estimated model. INITIALIZATION: 1. Initial Time Differences Of Arrival (TDOAs) are estimated using the cumulative state coherence transform as described in [2]. 2. Initial filters are computed form TDOAs as an-echoic. 3. Initial source estimates are obtained via binary masking (given initial filters) of the source STFTs. 4. Full-rank spatial covariances are initialized from initial filters. 5. An initial NMF source decompositions are computed from the power spectrograms of initial source estimates minimizing Kullback-Leibler divergence. COMPUTATIONAL TIME Our Matlab implementation on 2.2 GHz CPU runs up to 25 minutes, depending on particular configuration. REFERENCES: [1] A. Ozerov, E. Vincent, and F. Bimbot, \"A General Modular Framework for Audio Source Separation\". In: 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2010) (2010), submitted [2] F. Nesta, P. Svaizer, and M. Omologo, \"Cumulative state coherence transform for a robust two-channel multiple source localization\". In: Proc. Int. Conf. on Independent Component Analysis and Blind Source Separation (ICA’09) (2009)