Michael
Mandel
LabROSA
Department of Electrical Engineering
Columbia University
mim@ee.columbia.edu
These results are joint work with Ron Weiss.

Results were obtained using the binaural source localization algorithm
described in the references below. We construct a generative model of the
interaural phase and level differences and learn the parameters corresponding
to each source using an EM algorithm. The addition of the ILD model results
in better separation than that based on IPD alone, but it is of no benefit on
the 5cm subset because the ILDs are quite small. Nevertheless we included the
results on this subset for comparison purposes to our other submissions.

The EM procedure is used to generate a spectrographic mask for each source
where each cell contains the probability that a given source dominates the
mixture in that cell. Stereo separation was performed simply by multiplying
the STFT of each channel of the mixture by the \"soft mask\" for each source and
inverting. Monaural separation was performed by choosing the better ear for
the particular source (i.e. the one which contained the most energy after
applying the mask).

Our Matlab implementation of the algorithm took about 5 minutes per signal on
a 1.8 GHz Intel Xeon. The only prior information needed was the number of
sources in the mixture.

References:

M. I. Mandel and D. P. W. Ellis, \"EM localization and separation using
interaural level and phase cues,\" in IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics, October 2007.