A Supervised Learning Approach to Monaural Segregation of Reverberant Speech

被引:82
作者
Jin, Zhaozhang [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 04期
关键词
Computational auditory scene analysis (CASA); monaural segregation; room reverberation; speech separation; supervised learning; NEURAL-NETWORK CLASSIFIERS; NOISE;
D O I
10.1109/TASL.2008.2010633
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A major source of signal degradation in real environments is room reverberation. Monaural speech segregation in reverberant environments is a particularly challenging problem. Although inverse filtering has been proposed to partially restore the harmonicity of reverberant speech before segregation, this approach is sensitive to specific source/receiver and room configurations. This paper proposes a supervised learning approach to monaural segregation of reverberant voiced speech, which learns to map from a set of pitch-based auditory features to a grouping cue encoding the posterior probability of a time-frequency (T-F) unit being target dominant given observed features. We devise a novel objective function for the learning process, which directly relates to the goal of maximizing signal-to-noise ratio. The models trained using this objective function yield significantly better T-F unit labeling. A segmentation and grouping framework is utilized to form reliable segments under reverberant conditions and organize them into streams. Systematic evaluations show that our approach produces very promising results under various reverberant conditions and generalizes well to new utterances and new speakers.
引用
收藏
页码:625 / 638
页数:14
相关论文
共 61 条
[1]   Minimax classifiers based on neural networks [J].
Alaiz-Rodríguez, R ;
Guerrero-Curieses, A ;
Cid-Sueiro, J .
PATTERN RECOGNITION, 2005, 38 (01) :29-39
[2]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[3]  
[Anonymous], 1996, Neural Network Design
[4]  
[Anonymous], 2001, MULTIDIMENSIONAL SCA
[5]   Determination of the potential benefit of time-frequency gain manipulation [J].
Anzalone, Michael C. ;
Calandruccio, Lauren ;
Doherty, Karen A. ;
Carney, Laurel H. .
EAR AND HEARING, 2006, 27 (05) :480-492
[6]  
BACH F, 2004, P NIPS, P65
[7]   Strategies for learning in class imbalance problems [J].
Barandela, R ;
Sánchez, JS ;
García, V ;
Rangel, E .
PATTERN RECOGNITION, 2003, 36 (03) :849-851
[8]  
Bishop Christopher M, 1995, Neural networks for pattern recognition
[9]  
Boersma P., 2020, PRAAT DOING PHONETIC
[10]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120