Model-Based Expectation-Maximization Source Separation and Localization

被引:236
作者
Mandel, Michael I. [1 ]
Weiss, Ron J. [1 ]
Ellis, Daniel P. W. [1 ]
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 02期
基金
美国国家科学基金会;
关键词
Maximum-likelihood estimation; speech enhancement; time-frequency masking; underdetermined source separation; SOUND SOURCES; STATISTICS;
D O I
10.1109/TASL.2009.2029711
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their interaural phase and level differences, MESSL generates masks that can be used to isolate individual sound sources. We first describe a probabilistic model of interaural parameters that can be evaluated at individual spectrogram points. By creating a mixture of these models over sources and delays, the multi-source localization problem is reduced to a collection of single source problems. We derive an expectation-maximization algorithm for computing the maximum-likelihood parameters of this mixture model, and show that these parameters correspond well with interaural parameters measured in isolation. As a byproduct of fitting this mixture model, the algorithm creates probabilistic spectrogram masks that can be used for source separation. In simulated anechoic and reverberant environments, separations using MESSL produced on average a signal-to-distortion ratio 1.6 dB greater and Perceptual Evaluation of Speech Quality (PESQ) results 0.27 mean opinion score units greater than four comparable algorithms.
引用
收藏
页码:382 / 394
页数:13
相关论文
共 32 条
[1]   Self-localizing dynamic microphone arrays [J].
Aarabi, P .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2002, 32 (04) :474-484
[2]   The CIPICHRTF database [J].
Algazi, VR ;
Duda, RO ;
Thompson, DM ;
Avendano, C .
PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2001, :99-102
[3]  
[Anonymous], 2007, ADV NEURAL INFORM PR
[4]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[5]   Binaural processing model based on contralateral inhibition. I. Model structure [J].
Breebaart, J ;
van de Par, S ;
Kohlrausch, A .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 110 (02) :1074-1088
[6]  
Brown GuyJ., 2006, PROC IEEE INT CONF A, V5, P949
[7]  
Buchan N, 2005, ADV APPL MICROECON, V13, P1
[9]  
Colburn HS, 2005, SPR HDB AUD, V25, P272, DOI 10.1007/0-387-28863-5_8
[10]   Perceptual evaluation of blind source separation for robust speech recognition [J].
Di Persia, Leandro ;
Milone, Diego ;
Rufiner, Hugo Leonardo ;
Yanagida, Masuzo .
SIGNAL PROCESSING, 2008, 88 (10) :2578-2583