Addable Stress Speech Recognition with Multiplexing HMM: Training and Non-training Decision

被引:0
作者
Amornkul, Pakapong [1 ]
Chamnongthai, Kosin [1 ]
Temdee, Punnarumol [2 ]
机构
[1] King Mongkuts Univ Technol Thonburi, Elect & Telecommun Engn Dept, Bangkok, Thailand
[2] Mae Fah Luang Univ, Sch Informat Technol, Chiang Rai, Thailand
关键词
Speech recognition; Hidden Markov model; Stress speech recognition; Support vector machine; HIDDEN MARKOV-MODELS; SPEAKER ADAPTATION; NOISE; CLASSIFICATION; COMPENSATION; ENHANCEMENT;
D O I
10.1007/s11277-014-1721-3
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In stress speech recognition, a recognition model that is capable of processing multi-stress speech needs to be designed in the view points of accuracy and add-ability. This paper proposes addable stress speech recognition with multiplexing Hidden-Markov model (HMM). To achieve multi-stress speech, we propose a multiplexing topology that combines multiple stress speech models. Since each stress affects a speech in different way, having a speech recognition model that specifically trained to recognize words effected by the stress help improve the recognition rates. However, since each stress speech model gives it own independent recognized word, we need to have an effective decision module to choose the correct word. In each stress speech model, a MFCC is applied to the input speech. The result is fed into a HMM that is segmented into N parts. Each part of the segmentation provides its own tentative recognized word which in turn is an input to the proposed non-training decision module. Based on these tentative recognized words from segments of all stress speech models, the final recognized word is decided using coarse-to-fine concept performed by a majority vote, segment-weighted difference square score and next best score, respectively. Besides neutral speech, the proposed method was verified using three stresses including angry, loud, and Lombard. The results showed that the proposed method achieved 94.7 % recognition rate comparing to 94.2 % of the training-based decision method.
引用
收藏
页码:503 / 521
页数:19
相关论文
共 26 条
[1]  
Amornkul P., 2005, IEEE INT S INT SIGN
[2]  
Amornkul P., 2001, INT S COMM INF TECHN
[3]  
Amornkul P., 2003, INT S COMM INF TECHN
[4]  
[Anonymous], 2014, PROJECTIONS FUTURE G
[5]  
[Anonymous], PRACTICAL GUIDE SUPP
[6]  
Bou-Ghazale S. E., 1998, SPEECH COMMUN, V1, P549
[7]  
Chen Y., 1987, IEEE ICASSP, P717
[8]   SPEAKER ADAPTATION USING CONSTRAINED ESTIMATION OF GAUSSIAN MIXTURES [J].
DIGALAKIS, VV ;
RTISCHEV, D ;
NEUMEYER, LG .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05) :357-366
[9]  
Hansen J., 1990, Proc. Int. Conf. Spoken Lang. Process, P1125
[10]  
Hansen J. H. L., 1989, ICASSP-89: 1989 International Conference on Acoustics, Speech and Signal Processing (IEEE Cat. No.89CH2673-2), P266, DOI 10.1109/ICASSP.1989.266416