AN SVM BASED CLASSIFICATION APPROACH TO SPEECH SEPARATION

被引:0
作者
Han, Kun [1 ]
Wang, DeLiang [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
来源
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2011年
关键词
Speech separation; IBM; SVM; Re-thresholding; Segmentation; INTELLIGIBILITY; SEGREGATION; NOISE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Monaural speech separation is a very challenging task. CASA-based systems utilize acoustic features to produce a time-frequency (T-F) mask. In this study, we propose a classification approach to monaural separation problem. Our feature set consists of pitch-based features and amplitude modulation spectrum features, which can discriminate both voiced and unvoiced speech from nonspeech interference. We employ support vector machines (SVMs) followed by a re-thresholding method to classify each T-F unit as either target-dominated or interference-dominated. An auditory segmentation stage is then utilized to improve SVM-generated results. Systematic evaluations show that our approach produces high quality binary masks and outperforms a previous system in terms of classification accuracy.
引用
收藏
页码:4632 / 4635
页数:4
相关论文
共 14 条
[1]  
[Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225
[2]   Determination of the potential benefit of time-frequency gain manipulation [J].
Anzalone, Michael C. ;
Calandruccio, Lauren ;
Doherty, Karen A. ;
Carney, Laurel H. .
EAR AND HEARING, 2006, 27 (05) :480-492
[3]  
Boersma P., 2007, PRAAT DOING PHONETIC
[4]   Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation [J].
Brungart, Douglas S. ;
Chang, Peter S. ;
Simpson, Brian D. ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) :4007-4018
[5]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6]   Auditory segmentation based on onset and offset analysis [J].
Hu, Guoning ;
Wang, DeLiang .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02) :396-405
[7]   A MULTIPITCH TRACKING ALGORITHM FOR NOISY AND REVERBERANT SPEECH [J].
Jin, Zhaozhang ;
Wang, DeLiang .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4218-4221
[8]   A Supervised Learning Approach to Monaural Segregation of Reverberant Speech [J].
Jin, Zhaozhang ;
Wang, DeLiang .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04) :625-638
[9]   An algorithm that improves speech intelligibility in noise for normal-hearing listeners [J].
Kim, Gibak ;
Lu, Yang ;
Hu, Yi ;
Loizou, Philipos C. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 126 (03) :1486-1494
[10]   Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction [J].
Li, Ning ;
Loizou, Philipos C. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (03) :1673-1682