Automatic speech recognition and speech variability: A review

被引:312
作者
Benzeghiba, M. [1 ]
De Mori, R. [1 ]
Deroo, O. [1 ]
Dupont, S. [1 ]
Erbes, T. [1 ]
Jouvet, D. [1 ]
Fissore, L. [1 ]
Laface, P. [1 ]
Mertins, A. [1 ]
Ris, C. [1 ]
Rose, R. [1 ]
Tyagi, V. [1 ]
Wellekens, C. [1 ]
机构
[1] Multitel, Parc Initialis, B-7000 Mons, Belgium
关键词
speech recognition; speech analysis; speech modeling; speech intrinsic variations;
D O I
10.1016/j.specom.2007.02.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Major progress is being recorded regularly on both the technology and exploitation of automatic speech recognition (ASR) and spoken language systems. However, there are still technological barriers to flexible solutions and user satisfaction under some circumstances. This is related to several factors, such as the sensitivity to the environment (background noise), or the weak representation of grammatical and semantic knowledge. Current research is also emphasizing deficiencies in dealing with variation naturally present in speech. For instance, the lack of robustness to foreign accents precludes the use by specific populations. Also, some applications, like directory assistance, particularly stress the core recognition technology due to the very high active vocabulary (application perplexity). There are actually many factors affecting the speech realization: regional, sociolinguistic, or related to the environment or the speaker herself. These create a wide range of variations that may not be modeled correctly (speaker, gender, speaking rate, vocal effort, regional accent, speaking style, non-stationarity, etc.), especially when resources for system training are scarce. This paper outlines current advances related to these topics. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:763 / 786
页数:24
相关论文
共 324 条
[81]   Effects of speaking rate and word frequency on pronunciations in convertional speech [J].
Fosler-Lussier, E ;
Morgan, N .
SPEECH COMMUNICATION, 1999, 29 (2-4) :137-158
[82]   A framework for predicting speech recognition errors [J].
Fosler-Lussier, E ;
Amdal, I ;
Kuo, HKJ .
SPEECH COMMUNICATION, 2005, 46 (02) :153-170
[83]   Combination of machine scores for automatic grading of pronunciation quality [J].
Franco, H ;
Neumeyer, L ;
Digalakis, V ;
Ronen, O .
SPEECH COMMUNICATION, 2000, 30 (2-3) :121-130
[84]  
Fujinaga K, 2001, INT CONF ACOUST SPEE, P513, DOI 10.1109/ICASSP.2001.940880
[85]  
Fukunaga K., 1972, Introduction to statistical pattern recognition
[86]   Introduction to the special issue on spontaneous speech processing [J].
Furui, S ;
Kawahara, T ;
Beckman, M ;
Nakamura, S ;
Hirschberg, JB ;
Narayanan, S ;
Itahashi, S .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04) :349-350
[87]   Semi-tied covariance matrices for hidden Markov models [J].
Gales, MJF .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03) :272-281
[88]  
GALES MJF, 2001, P ASRU MAD CAMP ITAL
[89]  
GALES MJF, 1998, P ICSLP, P1783
[90]  
GALES MJF, 2001, P ICASSP 2001 MAY, P361