A new parameter smoothing method in the hybrid TDNN/HMM architecture for speech recognition

被引:3
作者
Jang, CS [1 ]
Un, CK [1 ]
机构
[1] KOREA ADV INST SCI & TECHNOL, DEPT ELECT ENGN, COMMUN RES LAB, TAEJON 305701, SOUTH KOREA
关键词
D O I
10.1016/S0167-6393(96)00052-0
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a new parameter smoothing method in the hybrid time-delay neural network (TDNN)/hidden Markov model (HMM) architecture for speech recognition. In the hybrid architecture, the TDNN and the HMM are combined using the activations from the second hidden layer of TDNN as the outputs of a fuzzy vector quantizer (FVQ). The HMM algorithm is modified to accommodate these FVQ outputs. in our modular construction of TDNN, the input layer is divided into two states to deal with the temporal structure of phonemic features, and the second hidden layer consists of two states in a time sequence. To improve the performance of the hybrid architecture, a new smoothing method has been proposed. The average values of the activation vectors from the second hidden layer of the modular TDNN are used to generate the smoothing matrix from which smoothed output symbol observation probability is obtained. With this proposed approach, our simulation results performed on speaker-independent Korean isolated words show the reduction of the error rate by 44.9% as compared to the floor smoothing method.
引用
收藏
页码:317 / 324
页数:8
相关论文
共 50 条
[41]   CSELT hybrid HMM/neural networks technology for continuos speech recognition [J].
Gemello, R ;
Albesano, D ;
Mana, F .
IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL V, 2000, :103-108
[42]   Hybrid NN/HMM acoustic modeling techniques for distributed speech recognition [J].
Stadermann, Jan ;
Rigoll, Gerhard .
SPEECH COMMUNICATION, 2006, 48 (08) :1037-1046
[43]   Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition [J].
Nahar, Khalid M. O. ;
Abu Shquier, Mohammed ;
Al-Khatib, Wasfi G. ;
Al-Muhtaseb, Husni ;
Elshafei, Moustafa .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (03) :495-508
[44]   An HMM-based method for Thai spelling speech recognition [J].
Pisarn, C. ;
Theeramunkong, T. .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2007, 54 (01) :76-95
[45]   Hybrid HMM-NN for speech recognition and prior class probabilities [J].
Albesano, D ;
Gemello, R ;
Mana, F .
ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, :2391-2395
[46]   Comparison between two hybrid HMM/MLP approaches in speech recognition [J].
Fontaine, V ;
Ris, C ;
Leich, H ;
Vantieghem, J ;
Accaino, S ;
VanCompernolle, D .
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, :3362-3365
[47]   A hybrid HMM/DPA adaptive gesture recognition method [J].
Rajko, S ;
Qian, C .
ADVANCES IN VISUAL COMPUTING, PROCEEDINGS, 2005, 3804 :227-234
[48]   EFFICIENT VITERBI SCORING ARCHITECTURE FOR HMM-BASED SPEECH RECOGNITION SYSTEMS [J].
CHO, YS ;
KIM, JY ;
LEE, HS .
ELECTRONICS LETTERS, 1992, 28 (25) :2338-2340
[49]   A Dual Microphone Speech Enhancement Method with A Smoothing Parameter Mask [J].
Jiang, Yi ;
Liu, Runsheng .
2017 10TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI), 2017,
[50]   Fuzzy parameter clustering method in speech recognition [J].
Xu, XH ;
Zhu, J .
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, :681-684