Musical Audio Source Separation Based on User-Selected F0 Track =============================================================== Jean-Louis Durrieu, EPFL-STI-IEL-LTS5 jean-louis.durrieu@epfl.ch We provided the results for 2 systems: an automatic system, described in [DDR2011], for the files starting with "durrieuAuto", and a user-assited system, described in [DT2012], for the files starting with "durrieuUser". durrieuAuto ----------- The fully automatic system described in [DDR2011] relies on a decomposition of the short-term power spectrum (STPS) into several contributions, namely the leading instrument source part and filter part, plus the accompaniment. This decomposition allows to track the "leading" instrument by followed the melody which it is playing, provided that it is the dominant instrument in the audio mixture (in terms of energy). We provide the results of the VUIMM system [DDR2011]. The given files are: durrieuAuto_dev2.zip, durrieuAuto_test1.zip and durrieuAuto_test2.zip Each zip-file contains the corresponding dataset vocal track estimations, plus, for test1, the guitar for the song by tamy. The processing time for each song was about 10 min (600 sec) on an Core i7 @2.93GHz, per excerpt. Note that the code was not optimized or parallelized. For dev2, our own computation of the evaluation metrics SDR/ISR/SIR/SAR: Song SDR ISR SIR SAR dev2__another_dreamer-the_ones_we_love 5.74 9.15 13.33 6.65 dev2__fort_minor-remember_the_name 2.38 4.78 4.24 2.94 dev2__ultimate_nz_tour 3.44 6.47 8.01 3.36 durrieuUser ----------- The user-guided system is described in an article submitted to LVA/ICA 2012 [DT2012]. We designed a GUI that allows to use the work from [DDR2011]: it first present the resulting F0 representation to the user, who has to select the melody track she desires to extract, and then the separation can be done, given her input F0 track. The corresponding files are: durrieuUser_dev2.zip, durrieuUser_test1.zip and durrieuUser_test2.zip Again, the estimated vocal tracks are provided in these files. The processing time could vary from one song to the other, due to the need for user feedback. On average, one can add 10 min for each excerpt. The full processing of one excerpt may therefore require about 20 min (1200 sec), on an Core i7 @2.93GHz. Note that for 2 songs (test2__shannon_hurley-sunrise__snip_62_85__mix and test1__bearlin-roads__snip__mix), due to the presence of polyphonies in the vocals, we run the program twice, to extract, at each run, one of the desired vocal tracks. The final estimated vocal track was generated as the sum of both estimates. We noticed that the best (subjective) combination was to first extract the lead vocals with VUIMM, and then to extract the backing vocals with VIMM [DDR2011]. For dev2, our own computation of the evaluation metrics SDR/ISR/SIR/SAR: Song SDR ISR SIR SAR dev2__another_dreamer-the_ones_we_love 6.22 9.69 14.39 7.04 dev2__fort_minor-remember_the_name 3.70 6.21 8.18 4.09 dev2__ultimate_nz_tour 4.38 7.18 11.01 4.62 Reference --------- [DDR2011] J.-L. Durrieu, B. David and G. Richard, "A Musically Motivated Mid-Level Representation For Pitch Estimation And Musical Audio Source Separation", IEEE Journal of Selected Topics on Signal Processing, October 2011, Vol. 5 (6), pp. 1180 - 1191. [DT2012] J.-L. Durrieu and J.-Ph. Thiran, "Musical Audio Source Separation Based on User-Selected F0 Track", submitted to LVA/ICA, March 12-15 2012, Tel Aviv, Israel.