Human phoneme recognition depending on speech-intrinsic variability

被引:31
作者
Meyer, Bernd T. [1 ]
Juergens, Tim [1 ]
Wesker, Thorsten [1 ]
Brand, Thomas [1 ]
Kollmeier, Birger [1 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, D-26111 Oldenburg, Germany
关键词
CONSONANT RECOGNITION; SPEAKING RATE; CLEAR SPEECH; NOISE; INTELLIGIBILITY; CONFUSIONS; HEARING; MODEL;
D O I
10.1121/1.3493450
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The influence of different sources of speech-intrinisic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR). (C) 2010 Acoustical Society of America. [DOI: 10.1121/1.3493450]
引用
收藏
页码:3126 / 3141
页数:16
相关论文
共 49 条
[1]   How Do Humans Process and Recognize Speech? [J].
Allen, Jont B. .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :567-577
[2]  
[Anonymous], OXFORD COMPANION ENG
[3]  
[Anonymous], THESIS U GOTTINGEN G
[4]   Modelling speaker intelligibility in noise [J].
Barker, Jon ;
Cooke, Martin .
SPEECH COMMUNICATION, 2007, 49 (05) :402-417
[5]   Automatic speech recognition and speech variability: A review [J].
Benzeghiba, M. ;
De Mori, R. ;
Deroo, O. ;
Dupont, S. ;
Erbes, T. ;
Jouvet, D. ;
Fissore, L. ;
Laface, P. ;
Mertins, A. ;
Ris, C. ;
Rose, R. ;
Tyagi, V. ;
Wellekens, C. .
SPEECH COMMUNICATION, 2007, 49 (10-11) :763-786
[6]   A MODEL FOR CONTEXT EFFECTS IN SPEECH RECOGNITION [J].
BRONKHORST, AW ;
BOSMAN, AJ ;
SMOORENBURG, GF .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1993, 93 (01) :499-509
[7]   An elitist approach to automatic articulatory-acoustic feature classification for phonetic characterization of spoken language [J].
Chang, SY ;
Wester, M ;
Greenberg, S .
SPEECH COMMUNICATION, 2005, 47 (03) :290-311
[8]   Robust automatic speech recognition with missing and unreliable acoustic data [J].
Cooke, M ;
Green, P ;
Josifovski, L ;
Vizinho, A .
SPEECH COMMUNICATION, 2001, 34 (03) :267-285
[9]  
Cooke M., 2008, P INT, P1781
[10]  
Dreschler WA, 2001, AUDIOLOGY, V40, P148