ISOLATED-UTTERANCE SPEECH RECOGNITION USING HIDDEN MARKOV-MODELS WITH BOUNDED STATE DURATIONS

被引：34

作者：

GU, HY ^{[1
]}

TSENG, CY ^{[1
]}

LEE, LS ^{[1
]}

机构：

[1] ACAD SINICA, INST HIST & PHILOL, TAIPEI 115, TAIWAN

来源：

IEEE TRANSACTIONS ON SIGNAL PROCESSING | 1991年 / 39卷 / 08期

关键词：

D O I：

10.1109/78.91145

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, hidden Markov models (HMM's) with bounded state durations (HMM/BSD's) are proposed to explicitly model the state durations of HMM's and more accurately consider the temporal structures existing in speech signals in a simple, direct but effective way. The state durations of HMM/BSD are simply lower and upper bounded by two bounding parameters for each state in the recognition phase, as compared to the approaches of using Poisson, gamma, or other distributions proposed previously to model state durations. These bounding parameters, on the other hand, can be estimated during the training phase. A series of experiments have been conducted for speaker dependent applications using all the 408 highly confusing first-tone Mandarin syllables as the example vocabulary. It was found that in the discrete case the recognition rate of HMM/BSD (78.5%) is 9.0%, 6.3%, and 1.9% higher than the conventional HMM's and HMM's with Poisson and gamma distributed state durations, respectively. In the continuous case (partitioned Gaussian mixture modeling), the recognition rates of HMM/BSD (88.3% with 1 mixture, 88.8% with 3 mixtures, and 89.4% with 5 mixtures) are 6.3%, 5.0%, and 5.5% higher than those of the conventional HMM's, and 5.9% (with 1 mixture), 3.9% (with 3 mixtures) and 3.1% (with 1 mixture), 1.8% (with 3 mixtures) higher than HMM's with Poisson and gamma distributed state durations, respectively. As to computation complexity and recognition speed, it turns out that the computation complexity required by the new modeling method proposed here is much less than that for HMM's with Poisson or gamma distributed state durations.

引用

页码：1743 / 1752

页数：10

共 25 条

[1] INCORPORATION OF TEMPORAL STRUCTURE INTO A VECTOR-QUANTIZATION-BASED PREPROCESSOR FOR SPEAKER-INDEPENDENT, ISOLATED-WORD RECOGNITION [J].

BERGH, AF ;

SOONG, FK ;

RABINER, LR .

AT&T TECHNICAL JOURNAL, 1985, 64 (05) :1047-1063

[2] ISOLATED-WORD SPEECH RECOGNITION USING MULTISECTION VECTOR QUANTIZATION CODEBOOKS [J].

BURTON, DK ;

SHORE, JE ;

BUCK, JT .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (04) :837-849

[3] SPEECH CODING BASED UPON VECTOR QUANTIZATION [J].

BUZO, A ;

GRAY, AH ;

GRAY, RM ;

MARKEL, JD .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (05) :562-574

[4]

Duda R. O., 1973, PATTERN CLASSIFICATI, V3

[5] A VQ-BASED PREPROCESSOR USING CEPSTRAL DYNAMIC FEATURES FOR SPEAKER-INDEPENDENT LARGE VOCABULARY WORD RECOGNITION [J].

FURUI, S .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1988, 36 (07) :980-987

[6] DISTANCE MEASURES FOR SPEECH PROCESSING [J].

GRAY, AH ;

MARKEL, JD .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (05) :380-391

[7]

Gray R. M., 1984, IEEE ASSP Magazine, V1, P4, DOI 10.1109/MASSP.1984.1162229

[8] MINIMUM PREDICTION RESIDUAL PRINCIPLE APPLIED TO SPEECH RECOGNITION [J].

ITAKURA, F .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01) :67-72

[9]

JUANG BH, 1987, IEEE T ACOUST SPEECH, V35, P947, DOI 10.1109/TASSP.1987.1165237

[10] MIXTURE AUTOREGRESSIVE HIDDEN MARKOV-MODELS FOR SPEECH SIGNALS [J].

JUANG, BH ;

RABINER, LR .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (06) :1404-1413

← 1 2 3 →