Hybrid HMM-BLSTM-Based Acoustic Modeling for Automatic Speech Recognition on Quran Recitation

被引:0
作者
Thirafi, Faza [1 ]
Lestari, Dessi Puji [1 ]
机构
[1] Inst Teknol Bandung, Sch Elect Engn & Informat, Bandung, Indonesia
来源
2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP) | 2018年
关键词
BLSTM; deep learning; hybrid; Maqam; Quran recitation;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Nowadays, there are many software applications which assist people to access Quran with their own device. Some of those applications are completed by feature to recognize Quran recitation from the user as well. Therefore, capability of the application to recognize Quran recitation is attracting to be observed. Automatic Speech Recognition (ASR) on Quran recitation is a new research for the past years, compared to English or other spoken languages. For some research, Hidden Markov Model (HMM) - Gaussian Mixture Model (GMM) is still popular to be utilized in acoustic modeling. However, HMM-GMM has a disadvantage in generalizing high-variance data. There is also a problem in solving non-linearly separable data. To tackle those problems, a new method to train the acoustic model for Quran speech recognition with deep learning approach was proposed in this paper. Bidirectional Long-Short Term Memory (BLSTM) as one of deep learning topologies was used in the experiment. This topology was combined with HMM as a hybrid system. In some research, this method had worked well for another language e. g. English speech recognition. In general, the research result showed that this method was also working greatly to Quran speech recognition compared to our baseline system with HMM-GMM. For baseline models, the average result of WER was 18.39%. On the other hand, our experimental model (acoustic model with Hybrid HMM-BLSTM) showed a far better result, with average WER value 4.63% for the same testing scenario. In this research also, Quran recitation style effect was also analyzed by building the model which depended on Quran recitation style (Maqam).
引用
收藏
页码:203 / 208
页数:6
相关论文
共 12 条
[1]   An introduction to hidden Markov models and Bayesian networks [J].
Ghahramani, Z .
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2001, 15 (01) :9-42
[2]  
Graves A., 2013, IEEE AUTOMATIC SPEEC
[3]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[4]  
Liu X., 2017, DEEP CONVOLUTIONAL L
[5]  
Medsker L., 2001, Recurrent neural network: design and applications
[6]  
Povey D., 2011, IEEE 2011 WORKSH AUT
[7]  
Reynolds D., 2008, GAUSSIAN MIXTURE MOD
[8]  
Ridwan T., 2017, 2017 20 C OR CHAPT I
[9]  
Sak H, 2014, INTERSPEECH, P338
[10]  
Saon George, 2013, 2013 IEEE WORKSH AUT