Segregation of unvoiced speech from nonspeech interference

被引：49

作者：

Hu, Guoning ^{[1
]}

Wang, DeLiang ^{[2
,3
]}

机构：

[1] Ohio State Univ, Biophys Program, Columbus, OH 43210 USA

[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[3] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2008年 / 124卷 / 02期

基金：

美国国家科学基金会;

关键词：

D O I：

10.1121/1.2939132

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating unvoiced speech from nonspeech interference. The study first addresses the question of how much speech is unvoiced. The segregation process occurs in two stages: Segmentation and grouping. In segmentation, the proposed model decomposes an input mixture into contiguous time-frequency segments by a multiscale analysis of event onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model for unvoiced speech segregation joins an existing model for voiced speech segregation to produce an overall system that can deal with both voiced and unvoiced speech. Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction. (C) 2008 Acoustical Society of America.

引用

页码：1306 / 1319

页数：14

共 48 条

[1]

Albert S. Bregman, 1990, AUDITORY SCENE ANAL, P411, DOI [DOI 10.7551/MITPRESS/1486.001.0001, 10.1121/1.408434, DOI 10.1121/1.408434]

[2]

Ali AMA, 2001, J ACOUST SOC AM, V109, P2217, DOI 10.1121/1.1357814

[3] Acoustic-phonetic features for the automatic classification of stop consonants [J].

Ali, AMA ;

Van der Spiegel, J ;

Mueller, P .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08) :833-841

[4]

[Anonymous], 1988, 2341 MRC APPL PSYCH

[5]

[Anonymous], 2004, PRAAT DOING PHONETIC

[6] Decoding speech in the presence of other sources [J].

Barker, JP ;

Cooke, MP ;

Ellis, DPW .

SPEECH COMMUNICATION, 2005, 45 (01) :5-25

[7]

Benesty J, 2005, SPEECH ENHANCEMENT

[8]

Bridle J.S., 1990, NEUROCOMPUTING, P227, DOI DOI 10.1007/978-3-642-76153-9_28

[9] COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].

BROWN, GJ ;

COOKE, M .

COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336

[10] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation [J].

Brungart, Douglas S. ;

Chang, Peter S. ;

Simpson, Brian D. ;

Wang, DeLiang .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) :4007-4018

← 1 2 3 4 5 →