An HMM-based speech recognizer using overlapping articulatory features

被引:18
作者
Erler, K
Freeman, GH
机构
[1] Dept. of Elec. and Comp. Engineering, University of Waterloo, Waterloo
关键词
D O I
10.1121/1.417358
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
State-of-the-art speech recognition is accomplished by using stochastic models (hidden Markov models) to represent small, nonoverlapping segments of speech (phonemes or allophones). In these traditional HMM speech recognizers, the control strategy does not draw extensively on the underlying structure of speech, but rather models speech as a set of disjoint ''segmental'' units. Such a strategy does not easily accommodate the influence that phonemes have on neighboring phonemes, nor does it attach any meaning to the internal states of the model. In this work, an alternative HMM control strategy is presented which draws on the idea that the production of speech is a process governed by the mechanical motion of a set of relatively slow moving ''articulators.'' The articulatory feature model is defined as an HMM in which each internal state of the model represents one possible configuration of the (quantized) articulatory system. Rather than modeling disjoint segments, this model represents the acoustic patterns associated with the various articulatory configurations of the speech production system. Instead of a set of disjoint models, this scheme represents the entire vocabulary with a single, large HMM. The internal model states now have meaning due to their correlation with the physical state of the production system. This allows the incorporation of linguistic and physiological knowledge to improve performance. System philosophy, implementation, and results are discussed. (C) 1996 Acoustical Society of America.
引用
收藏
页码:2500 / 2513
页数:14
相关论文
共 29 条
[1]  
Browman C., 1989, Phonology, V6, P201, DOI [10.1017/S0952675700001019, DOI 10.1017/S0952675700001019]
[2]   GESTURAL SPECIFICATION USING DYNAMICALLY-DEFINED ARTICULATORY STRUCTURES [J].
BROWMAN, CP ;
GOLDSTEIN, L .
JOURNAL OF PHONETICS, 1990, 18 (03) :299-320
[3]  
Catford J.C., 1977, FUNDAMENTAL PROBLEMS
[4]  
Chomsky Noam., 1968, The sound pattern of English
[5]  
Clark J, 1990, INTRO PHONETICS PHON
[6]  
DENG D, 1990, COMPUTER SPEECH LANG, V4, P345
[7]   STRUCTURAL DESIGN OF HIDDEN MARKOV MODEL SPEECH RECOGNIZER USING MULTIVALUED PHONETIC FEATURES - COMPARISON WITH SEGMENTAL SPEECH UNITS [J].
DENG, L ;
ERLER, K .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1992, 92 (06) :3058-3067
[8]   PHONEMIC HIDDEN MARKOV-MODELS WITH CONTINUOUS MIXTURE OUTPUT DENSITIES FOR LARGE VOCABULARY WORD RECOGNITION [J].
DENG, L ;
KENNY, P ;
LENNIG, M ;
GUPTA, V ;
SEITZ, F ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (07) :1677-1681
[9]  
Edwards HaroldT., 1992, APPL PHONETICS SOUND
[10]  
Erler K., 1993, Computer Speech and Language, V7, P265, DOI 10.1006/csla.1993.1014