We propose to evaluate the results using three objective criteria, extending those of the BSS Eval toolbox to multichannel signals. These criteria can be computed for all types of separation algorithms and do not necessitate knowledge of the separating filters or masks.
The quality of the estimated spatial image s_est_ij(t) of source j on channel i, that is the estimated contribution of this source to the observed mixture in this channel, is assessed by the decomposition
where s_true_ij(t) is the true source image and e_spat_ij(t), e_interf_ij(t) and e_artif_ij(t) are distinct error components representing spatial (or filtering) distortion, interference and artifacts. This decomposition is motivated by the auditory segregation between sounds from the target source, sounds from other sources and "gurgling" noise, corresponding respectively to the signals s_true_ij(t)+e_spat_ij(t), e_interf_ij(t) and e_artif_ij(t). The signal s_true_ij(t)+e_spat_ij(t) is obtained by least-squares projection of the estimated source image onto the signal subspace spanned by filtered versions of the true source image. Similarly, the signal s_true_ij(t)+e_spat_ij(t)+e_interf_ij(t) is obtained by least-squares projection of the estimated source image onto the signal subspace spanned by filtered versions of all the source images. The filter length is set to 512 (32 ms), which is the maximal tractable length.
The relative amounts of spatial distortion, interference and artifacts are then measured using three energy ratio criteria expressed in decibels (dB)
The correspondence between these criteria and the three error components is illustrated below.
- the source Image to Spatial distortion Ratio (ISR)
- the Source to Interference Ratio (SIR)
- the Sources to Artifacts Ratio (SAR)
The above criteria were implemented in Matlab as a collection of two M-files distributed under the terms of the GNU Public License:
Representing the estimated spatial image of source j by a matrix se of size 2*160000 and the true spatial images of all sources by a matrix S of size 2*160000*3 or 2*160000*4 (depending on the number of sources), the criteria can be computed via
We welcome any additional criteria proposed by the participants.