Subjective and Objective Quality Assessment of Audio Source Separation

被引:204
作者
Emiya, Valentin [1 ]
Vincent, Emmanuel [1 ]
Harlander, Niklas [2 ]
Hohmann, Volker [2 ]
机构
[1] INRIA, Ctr Inria Rennes Bretagne Atlantique, F-35042 Rennes, France
[2] Carl von Ossietzky Univ Oldenburg, D-26111 Oldenburg, Germany
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 07期
关键词
Audio; objective measure; quality assessment; source separation; subjective test protocol; MODEL;
D O I
10.1109/TASL.2011.2109381
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We aim to assess the perceived quality of estimated source signals in the context of audio source separation. These signals may involve one or more kinds of distortions, including distortion of the target source, interference from the other sources or musical noise artifacts. We propose a subjective test protocol to assess the perceived quality with respect to each kind of distortion and collect the scores of 20 subjects over 80 sounds. We then propose a family of objective measures aiming to predict these subjective scores based on the decomposition of the estimation error into several distortion components and on the use of the PEMO-Q perceptual salience measure to provide multiple features that are then combined. These measures increase correlation with subjective scores up to 0.5 compared to nonlinear mapping of individual state-of-the-art source separation measures. Finally, we released the data and code presented in this paper in a freely available toolkit called PEASS.
引用
收藏
页码:2046 / 2057
页数:12
相关论文
共 39 条
[31]  
VINCENT E, 2010, MACHINE AUDITION PRI
[32]  
Vincent E., 2003, P INT S ICA BSS ICA, P715
[33]  
Vincent E., 2006, P UK ICA RES NETW WO
[34]  
Vincent E, 2007, LECT NOTES COMPUT SC, V4666, P552
[35]   Performance measurement in blind audio source separation [J].
Vincent, Emmanuel ;
Gribonval, Remi ;
Févotte, Cedric .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1462-1469
[36]  
Vincent E, 2009, LECT NOTES COMPUT SC, V5441, P734, DOI 10.1007/978-3-642-00599-2_92
[37]  
Wang D, 2006, Computational auditory scene analysis: Principles, algorithms, and applications
[38]  
Winkler S., 2005, DIGITAL VIDEO QUALIT
[39]   Blind separation of speech mixtures via time-frequency masking [J].
Yilmaz, Ö ;
Rickard, S .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (07) :1830-1847