Mongolian Speech Recognition Based on Deep Neural Networks

被引:7
作者
Zhang, Hui [1 ]
Bao, Feilong [1 ]
Gao, Guanglai [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China
来源
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015) | 2015年 / 9427卷
关键词
Mongolian; Deep Neural Networks (DNNs); Gaussian Mixture Models (GMMs); N-gram language model;
D O I
10.1007/978-3-319-25816-4_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mongolian is an influential language. And better Mongolian Large Vocabulary Continuous Speech Recognition (LVCSR) systems are required. Recently, the research of speech recognition has achieved a big improvement by introducing the Deep Neural Networks (DNNs). In this study, a DNN-based Mongolian LVCSR system is built. Experimental results show that the DNN-based models outperform the conventional models which based on Gaussian Mixture Models (GMMs) for the Mongolian speech recognition, by a large margin. Compared with the best GMM-based model, the DNN-based one obtains a relative improvement over 50 %. And it becomes a new state-of-the-art system in this field.
引用
收藏
页码:180 / 188
页数:9
相关论文
共 22 条
[1]  
[Anonymous], 2009, Ethnologue: languages of the world
[2]  
[Anonymous], 2015, ARXIV150401482
[3]  
Ayush Altangerel, 2013, 2013 8th International Forum on Strategic Technology (IFOST), P341, DOI 10.1109/IFOST.2013.6616910
[4]  
Bao Feilong, 2014, Computer Engineering and Applications, V50, P206, DOI 10.3778/j.issn.1002-8331.1301-0314
[5]  
Bao FL, 2013, INT CONF ACOUST SPEE, P8136, DOI 10.1109/ICASSP.2013.6639250
[6]  
Bao FL, 2013, COMM COM INF SC, V400, P13
[7]  
Bao FL, 2009, PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, P616
[8]  
Bengio Y, 2006, STUD FUZZ SOFT COMP, V194, P137
[9]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[10]   VITERBI ALGORITHM [J].
FORNEY, GD .
PROCEEDINGS OF THE IEEE, 1973, 61 (03) :268-278