Multichannel Nonnegative Tensor Factorization With Structured Constraints For User-Guided Audio Source Separation

Sound examples for the paper: A. Ozerov, C. Févotte, R. Blouet and J.-L. Durrieu, "Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'11), Prague, May, 2011. (to appear)

Recording #1 (23 to 43 seconds snip)

Recording #2 (60 to 80 seconds snip)

Recording #3 (44 to 64 seconds snip)

Tracks separated by
proposed method

Trumpet

Bass

Drums

Voice

Sax

Bass

Piano

Brushes

Voice

Violin

F.Horn

Bass

Tracks separated by DUET [1]

Trumpet

Bass+Drums

Voice+Sax

Bass

Piano+Brushes

Time codes
(of full-length recordings)

Scatter plots
(of full-length recordings)

Upmixing to 5.1 using
the sources separated
by the proposed method:

C:	center
L/R:	left, right
Lf:	low frequency
Ls/Rs:	L/R surround

Videos from the project
SARAH

References:

[1] O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Transactions on Signal Processing, vol. 52, no. 7, pp. 1830–1847, 2004.

Alexey Ozerov, Cédric Févotte, Raphaël Blouet, Jean-Louis Durrieu