A New Neural Network Based Logistic Regression Classifier For Improving Mispronunciation Detection of L2 Language Learners

被引:0
作者
Hu, Wenping [1 ,2 ]
Qian, Yao [2 ]
Soong, Frank K. [2 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Peoples R China
[2] Microsoft Res, Beijing, Peoples R China
来源
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年
关键词
CALL; Mispronunciation Detection; Deep Neural Network; Logistic Regression;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a Neural Network (NN) based, Logistic Regression (LR) classifier for improving phone mispronunciation detection rate in a Computer-Aided Language Learning (CALL) system. A general neural network with multiple hidden layers for extracting useful speech features is first trained with pooled, training data, and then phone-dependent, 2-class logistic regression classifiers are trained as individual, phoneme specific nodes at the output layer. This new NN-based classifier with shared hidden layers streamlines the time-consuming work needed in training multiple individual classifiers separately, i.e., one for a specific phoneme, and learns common feature representation via the shared hidden layers. Its improved performance, when compared with independently trained, phoneme specific classifiers, is verified on a testing database of isolated English words recorded by non-native English learners. Compared with the conventional Goodness of Pronunciation (GOP)based approach, the NN-based LR classifier improves the precision and recall by 37.1% and 11.7% (absolute), respectively. On the same test data, it also outperforms a Support Vector Machine (SVM)-based classifier, which is widely used for mispronunciation detection, and at a slightly better precision rate, the recall is improved by 10.6% (absolute) and the relative improvement is 21.6%.
引用
收藏
页码:245 / +
页数:2
相关论文
共 24 条
[1]  
[Anonymous], 2006, Pattern recognition and machine learning
[2]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[3]  
Franco H., 1999, Proc. Eurospeech, V2, P851
[4]  
Graddol D., 2006, WHY GLOBAL ENGLISH M
[5]  
Hinton G.E., 2012, ARXIV, DOI DOI 10.9774/GLEAF.978-1-909493-38-4_2
[6]   A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554
[7]  
Hirabayashi K, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P598
[8]  
Hu W., 2014, P ICASSP 2014
[9]  
Jie J., 2009, P ICASSP 2012 IEEE, P4833
[10]  
Joachims T., 1998, 24 LS8