Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition

被引:11
作者
Nahar, Khalid M. O. [1 ]
Abu Shquier, Mohammed [2 ]
Al-Khatib, Wasfi G. [3 ]
Al-Muhtaseb, Husni [3 ]
Elshafei, Moustafa [4 ]
机构
[1] Yarmouk Univ, Fac Comp Sci & Informat Technol, Dept Comp Sci, Irbid 21163, Jordan
[2] Jarash Univ, Fac Comp Sci & Informat Technol, Dept Comp Sci, Jarash, Jordan
[3] King Fahd Univ Petr & Minerals, Informat & Comp Sci Dept, Dhahran 31261, Saudi Arabia
[4] King Fahd Univ Petr & Minerals, Dept Syst Engn, Dhahran 31261, Saudi Arabia
关键词
Learning vector quantization (LVQ); Codebooks; K-means algorithm; Phonemes transcription; Hidden Markov model (HMM); Hybrid LVQ/HMM model;
D O I
10.1007/s10772-016-9337-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In attempt to increase the rate of Arabic phonemes recognition, we introduce a novel hybrid recognition algorithm. The algorithm is composed of the learning vector quantization (LVQ) and hidden Markov model (HMM). The hybrid algorithm used to recognizing Arabic phonemes in continuous open-vocabulary speech. A recorded Arabic corpus of different TV news for modern standard Arabic was used for training and testing purposes. We employ a data driven approach to generate the training feature vectors that embed the frame neighboring correlation information. Next, we generate the phonemes codebooks using the K-means splitting algorithm. Then, we trained the generated codebooks using the LVQ algorithm. We achieved a performance of 98.49 % during independent classification training and 90 % during dependent classification training. When using the trained LVQ codebooks in Arabic utterance transcription, the phoneme recognition rate was 72 % using LVQ only. We combined the LVQ codebooks with the single state HMM model using enhanced Viterbi algorithm which includes the phonemes bigrams. We achieved 89 % of Arabic phonemes recognition rate based on the hybrid LVQ/HMM algorithm.
引用
收藏
页码:495 / 508
页数:14
相关论文
共 22 条
[1]   Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach [J].
AbuZeina, Dia ;
Al-Khatib, Wasfi ;
Elshafei, Moustafa ;
Al-Muhtaseb, Husni .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (02) :65-75
[2]  
Al-Manie Mohammed A., 2010, INDIAN J SCI TECHNOL, V3, P1134
[3]  
Ali M, 2009, J INF TECHNOL RES, V2, P67, DOI 10.4018/jilr.2009062905
[4]  
AVDAGIC Z, 2007, 2007 IEEE INT C SIGN, P1195
[5]   Competitive radial basis functions training for phone classification [J].
Cosi, P ;
Frasconi, P ;
Gori, M ;
Lastrucci, L ;
Soda, G .
NEUROCOMPUTING, 2000, 34 :117-129
[6]  
Ding-Ding Ma, 2012, 2012 International Conference on Machine Learning and Cybernetics (ICMLC 2012). Proceedings, P792, DOI 10.1109/ICMLC.2012.6359026
[7]  
Essa E., 2008, P 2008 IEEE INT C CO
[8]  
Gemmeke J., 2009, P EUR SIGN PROC C GL, P24
[9]   DISTRIBUTED AND LOCAL NEURAL CLASSIFIERS FOR PHONEME RECOGNITION [J].
GURGEN, F ;
ALPAYDIN, R ;
UNLUAKIN, U ;
ALPAYDIN, E .
PATTERN RECOGNITION LETTERS, 1994, 15 (11) :1111-1118
[10]  
Katagiri S., 1988, SP88104 7ECE