author: Francesco Nesta
affiliation: Fondazione Bruno Kessler-Irst,Trento, Italy, cit-SHINE unit
contact: nesta (at) fbk.eu 
web address: http://shine.fbk.eu/people/nesta


Both datasets were processed with a 2 channel batch on-line Blind Source Separation based on a frequency-domain weighted Natural Gradient [1][2]. The signals recorded at microphones 1 and 4 were used as mixtures.

The weights for the weighted Natural Gradient were determined according to the coherence of each frame in the TDOA direction of the target source (estimated through the Generalized State Coherence Transform [3] computed from each STFT input frame [2]).

No further permutation alignment method was applied.

Alg 1
Signals were transformed from time-domain to STFT domain with Hanning windows of 2048 points shifted of 512 points. The batch ICA uses blocks of 30 frames, shifted of 15 frames. In each batch, ICA runs for 20 iterations with a step-size 0.2. Note, the low number of iterations is sufficient due to the efficent gradient weighting (see [1],[2]). In each batch and frequency bin sources are linearly demixed. The estimated components are used to determine, for each TF point, optimal gains of a single-channel Wiener filter (applied to each channel to estimate the multichannel image)[3].

Alg 2 the same processing as for Alg1 but the output signals are post-processed with Binary Masking, i.e. selecting for each TF point only the source with highest power.


[1] "Convolutive BSS of short mixtures by ICA recursively regularized across frequencies", Francesco Nesta, Piergiorgio Svaizer, Maurizio Omologo, , in IEEE Transactions on Audio, Speech and Language Processing, March 2011, issue 3

[2]"Convolutive underdetermined source separation through weighted Interleaved ICA and spatio-temporal correlation", Francesco Nesta, Maurizio Omologo, submitted to LVA/ICA 2012, Tel Aviv 

[3] "Generalized State Coherence Transform for multidimensional TDOA estimation of multiple sources," Francesco Nesta, Maurizio Omologo, to appear on Audio, Speech, and Language Processing, IEEE Transactions on)

[4]"Robust Automatic Speech Recognition through On-line Semi-Blind Source Extraction", Francesco Nesta and Marco Matassoni, CHIME Workshop 2011, Florence(Italy)