AN SVM BASED CLASSIFICATION APPROACH TO SPEECH SEPARATION

被引：0

作者：

Han, Kun ^{[1
]}

Wang, DeLiang ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2011年

关键词：

Speech separation; IBM; SVM; Re-thresholding; Segmentation; INTELLIGIBILITY; SEGREGATION; NOISE;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Monaural speech separation is a very challenging task. CASA-based systems utilize acoustic features to produce a time-frequency (T-F) mask. In this study, we propose a classification approach to monaural separation problem. Our feature set consists of pitch-based features and amplitude modulation spectrum features, which can discriminate both voiced and unvoiced speech from nonspeech interference. We employ support vector machines (SVMs) followed by a re-thresholding method to classify each T-F unit as either target-dominated or interference-dominated. An auditory segmentation stage is then utilized to improve SVM-generated results. Systematic evaluations show that our approach produces high quality binary masks and outperforms a previous system in terms of classification accuracy.

引用

页码：4632 / 4635

页数：4

共 14 条

[1]

[Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225

[2] Determination of the potential benefit of time-frequency gain manipulation [J].

Anzalone, Michael C. ;

Calandruccio, Lauren ;

Doherty, Karen A. ;

Carney, Laurel H. .

EAR AND HEARING, 2006, 27 (05) :480-492

[3]

Boersma P., 2007, PRAAT DOING PHONETIC

[4] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation [J].

Brungart, Douglas S. ;

Chang, Peter S. ;

Simpson, Brian D. ;

Wang, DeLiang .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) :4007-4018

[5] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[6] Auditory segmentation based on onset and offset analysis [J].

Hu, Guoning ;

Wang, DeLiang .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02) :396-405

[7] A MULTIPITCH TRACKING ALGORITHM FOR NOISY AND REVERBERANT SPEECH [J].

Jin, Zhaozhang ;

Wang, DeLiang .

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4218-4221

[8] A Supervised Learning Approach to Monaural Segregation of Reverberant Speech [J].

Jin, Zhaozhang ;

Wang, DeLiang .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04) :625-638

[9] An algorithm that improves speech intelligibility in noise for normal-hearing listeners [J].

Kim, Gibak ;

Lu, Yang ;

Hu, Yi ;

Loizou, Philipos C. .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 126 (03) :1486-1494

[10] Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction [J].

Li, Ning ;

Loizou, Philipos C. .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (03) :1673-1682

← 1 2 →