Using broad phonetic group-experts for improved speech recognition

被引:38
作者
Scanlon, Patricia
Ellis, Daniel P. W.
Reilly, Richard B.
机构
[1] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
[2] Univ Coll Dublin, Sch Elect Elect & Mech Engn, Dublin 4, Ireland
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 03期
关键词
automatic speech recognition; broad phonetic groups (BPGs); mixture of experts; mutual information (MI);
D O I
10.1109/TASL.2006.885907
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In phoneme recognition experiments, it was found that approximately 75% of misclassified frames were assigned labels within the same broad phonetic group (BPG). While the phoneme can be described as the smallest distinguishable unit of speech, phonemes within BPGs contain very similar characteristics and can be easily confused. However, different BPGs, such as vowels and stops, possess very different spectral and temporal characteristics. In order to accommodate the full range of phonemes, acoustic models of speechrecogpition systems calculate input features from all frequencies over a large temporal context window. A new phoneme classifier is proposed consisting of a modular arrangement of experts, with one expert assigned to each BPG and focused on discriminating between phonemes within that BPG. Due to the different temporal and spectral structure of each BPG, novel feature sets are extracted using mutual information, to select a relevant time-frequency (TF) feature set for each expert. To construct a phone recognition system, the output of each expert is combined with a baseline classifier under the guidance of a separate BPG detector. Considering phoneme recognition experiments using the TIMIT continuous speech corpus, the proposed architecture afforded significant error rate reductions up to 5% relative.
引用
收藏
页码:803 / 812
页数:10
相关论文
共 14 条
[1]  
Bilmes JA, 1998, INT CONF ACOUST SPEE, P469, DOI 10.1109/ICASSP.1998.674469
[2]  
Bourland H. A., 1994, CONNECTIONIST SPEECH
[3]  
CHANG S, 2001, P EUR, P1725
[4]  
Cover TM, 2006, Elements of Information Theory
[5]  
HALBERSTADT AK, 1997, P EUR, P401
[6]   PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].
HERMANSKY, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752
[7]  
Morris A., 1993, Computer Speech and Language, V7, P121, DOI 10.1006/csla.1993.1006
[8]  
Rajamanohar M, 2005, 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), P59
[9]   Experiments in speech recognition using a modular MLP architecture for acoustic modelling [J].
Reynolds, TJ ;
Antoniou, CA .
INFORMATION SCIENCES, 2003, 156 (1-2) :39-54
[10]  
Scanlon P., 2003, P INTERSPEECH, P857