Multichannel Nonnegative Tensor Factorization With Structured Constraints For User-Guided Audio Source Separation

Sound examples for the paper: A. Ozerov, C. Févotte, R. Blouet and J.-L. Durrieu, "Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'11), Prague, May, 2011. (to appear)

Recording #1 (23 to 43 seconds snip)
Recording #2 (60 to 80 seconds snip)
Recording #3 (44 to 64 seconds snip)
Tracks separated by
proposed method

Trumpet

Bass

Drums

Voice

Sax

Bass

Piano

Brushes

Voice

Violin

F.Horn

Bass
Tracks separated by DUET [1]
Trumpet

Bass+Drums

Voice+Sax

Bass

Piano+Brushes
-
Time codes
(of full-length recordings)
Scatter plots
(of full-length recordings)
Upmixing to 5.1 using
the sources separated
by the proposed method:
C: center
L/R: left, right
Lf: low frequency
Ls/Rs: L/R surround
C L R Lf Ls Rs
C L R Lf Ls Rs
C L R Lf Ls Rs
Videos from the project
SARAH

References:

[1] O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Transactions on Signal Processing, vol. 52, no. 7, pp. 1830–1847, 2004.


Alexey Ozerov, Cédric Févotte, Raphaël Blouet, Jean-Louis Durrieu