ISOLATED-UTTERANCE SPEECH RECOGNITION USING HIDDEN MARKOV-MODELS WITH BOUNDED STATE DURATIONS

被引:34
作者
GU, HY [1 ]
TSENG, CY [1 ]
LEE, LS [1 ]
机构
[1] ACAD SINICA, INST HIST & PHILOL, TAIPEI 115, TAIWAN
关键词
D O I
10.1109/78.91145
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, hidden Markov models (HMM's) with bounded state durations (HMM/BSD's) are proposed to explicitly model the state durations of HMM's and more accurately consider the temporal structures existing in speech signals in a simple, direct but effective way. The state durations of HMM/BSD are simply lower and upper bounded by two bounding parameters for each state in the recognition phase, as compared to the approaches of using Poisson, gamma, or other distributions proposed previously to model state durations. These bounding parameters, on the other hand, can be estimated during the training phase. A series of experiments have been conducted for speaker dependent applications using all the 408 highly confusing first-tone Mandarin syllables as the example vocabulary. It was found that in the discrete case the recognition rate of HMM/BSD (78.5%) is 9.0%, 6.3%, and 1.9% higher than the conventional HMM's and HMM's with Poisson and gamma distributed state durations, respectively. In the continuous case (partitioned Gaussian mixture modeling), the recognition rates of HMM/BSD (88.3% with 1 mixture, 88.8% with 3 mixtures, and 89.4% with 5 mixtures) are 6.3%, 5.0%, and 5.5% higher than those of the conventional HMM's, and 5.9% (with 1 mixture), 3.9% (with 3 mixtures) and 3.1% (with 1 mixture), 1.8% (with 3 mixtures) higher than HMM's with Poisson and gamma distributed state durations, respectively. As to computation complexity and recognition speed, it turns out that the computation complexity required by the new modeling method proposed here is much less than that for HMM's with Poisson or gamma distributed state durations.
引用
收藏
页码:1743 / 1752
页数:10
相关论文
共 25 条
  • [1] INCORPORATION OF TEMPORAL STRUCTURE INTO A VECTOR-QUANTIZATION-BASED PREPROCESSOR FOR SPEAKER-INDEPENDENT, ISOLATED-WORD RECOGNITION
    BERGH, AF
    SOONG, FK
    RABINER, LR
    [J]. AT&T TECHNICAL JOURNAL, 1985, 64 (05): : 1047 - 1063
  • [2] ISOLATED-WORD SPEECH RECOGNITION USING MULTISECTION VECTOR QUANTIZATION CODEBOOKS
    BURTON, DK
    SHORE, JE
    BUCK, JT
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (04): : 837 - 849
  • [3] SPEECH CODING BASED UPON VECTOR QUANTIZATION
    BUZO, A
    GRAY, AH
    GRAY, RM
    MARKEL, JD
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (05): : 562 - 574
  • [4] Duda R. O., 1973, PATTERN CLASSIFICATI, V3
  • [5] A VQ-BASED PREPROCESSOR USING CEPSTRAL DYNAMIC FEATURES FOR SPEAKER-INDEPENDENT LARGE VOCABULARY WORD RECOGNITION
    FURUI, S
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1988, 36 (07): : 980 - 987
  • [6] DISTANCE MEASURES FOR SPEECH PROCESSING
    GRAY, AH
    MARKEL, JD
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (05): : 380 - 391
  • [7] Gray R. M., 1984, IEEE ASSP Magazine, V1, P4, DOI 10.1109/MASSP.1984.1162229
  • [8] MINIMUM PREDICTION RESIDUAL PRINCIPLE APPLIED TO SPEECH RECOGNITION
    ITAKURA, F
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01): : 67 - 72
  • [9] JUANG BH, 1987, IEEE T ACOUST SPEECH, V35, P947, DOI 10.1109/TASSP.1987.1165237
  • [10] MIXTURE AUTOREGRESSIVE HIDDEN MARKOV-MODELS FOR SPEECH SIGNALS
    JUANG, BH
    RABINER, LR
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (06): : 1404 - 1413