Alexey
Ozerov
IRISA / INRIA - Rennes
alexey.ozerov@irisa.fr
A. Ozerov, S. Arberet, and E. Vincent

ALGORITHM:

1. 200 iterations of Generalized Expectation-Maximization algorithm from [1] for joint estimation of flexible model. The following particular models were used:

   a. for speech sources: harmonic NMF (for spectral power) / rank-1 spatial covariance   
             (see [1] for details)

   b. for music sources: NMF with K = 4 (for spectral power) / rank-1 spatial covariance   
             (see [1] for details)

2. The sources are recovered via Wiener filtering, as described in [1], given estimated model.


INITIALIZATION:
   
1. An initial mixing matrix is estimated by DEMIX algorithm [2].

2. Initial source estimates are obtained via l0 norm minimization (given initial mixing matrix) of the source STFTs [3].

3. An initial NMF source decompositions are computed from the power spectrograms of initial source estimates minimizing Kullback-Leibler divergence.

COMPUTATIONAL TIME

Our Matlab implementation on 2.2 GHz CPU runs up to 25 minutes, depending on particular configuration.

REFERENCES: 

[1] A. Ozerov, E. Vincent, and F. Bimbot, \"A General Modular Framework for Audio Source Separation\". In: 9th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2010) (2010), submitted

[2] S. Arberet, R. Gribonval, and F. Bimbot, \"A robust method to count and locate audio sources in a stereophonic linear instantaneous mixture\\\", In Proc. Int. Conf. on Independent Component Analysis and Blind Source Separation (ICA). (2006) 536-543

[3] E. Vincent, \"Complex nonconvex lp norm minimization for underdetermined source separation\", In Proc. Int. Conf. on Independent Component Analysis and Blind Source Separation (ICA). (2007)