A New Neural Network Based Logistic Regression Classifier For Improving Mispronunciation Detection of L2 Language Learners

被引：0

作者：

Hu, Wenping ^{[1
,2
]}

Qian, Yao ^{[2
]}

Soong, Frank K. ^{[2
]}

机构：

[1] Univ Sci & Technol China, Hefei 230026, Peoples R China

[2] Microsoft Res, Beijing, Peoples R China

来源：

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年

关键词：

CALL; Mispronunciation Detection; Deep Neural Network; Logistic Regression;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a Neural Network (NN) based, Logistic Regression (LR) classifier for improving phone mispronunciation detection rate in a Computer-Aided Language Learning (CALL) system. A general neural network with multiple hidden layers for extracting useful speech features is first trained with pooled, training data, and then phone-dependent, 2-class logistic regression classifiers are trained as individual, phoneme specific nodes at the output layer. This new NN-based classifier with shared hidden layers streamlines the time-consuming work needed in training multiple individual classifiers separately, i.e., one for a specific phoneme, and learns common feature representation via the shared hidden layers. Its improved performance, when compared with independently trained, phoneme specific classifiers, is verified on a testing database of isolated English words recorded by non-native English learners. Compared with the conventional Goodness of Pronunciation (GOP)based approach, the NN-based LR classifier improves the precision and recall by 37.1% and 11.7% (absolute), respectively. On the same test data, it also outperforms a Support Vector Machine (SVM)-based classifier, which is widely used for mispronunciation detection, and at a slightly better precision rate, the recall is improved by 10.6% (absolute) and the relative improvement is 21.6%.

引用

页码：245 / +

页数：2

共 24 条

[1]

[Anonymous], 2006, Pattern recognition and machine learning

[2]

Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482

[3]

Franco H., 1999, Proc. Eurospeech, V2, P851

[4]

Graddol D., 2006, WHY GLOBAL ENGLISH M

[5]

Hinton G.E., 2012, ARXIV, DOI DOI 10.9774/GLEAF.978-1-909493-38-4_2

[6] A fast learning algorithm for deep belief nets [J].

Hinton, Geoffrey E. ;

Osindero, Simon ;

Teh, Yee-Whye .

NEURAL COMPUTATION, 2006, 18 (07) :1527-1554

[7]

Hirabayashi K, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P598

[8]

Hu W., 2014, P ICASSP 2014

[9]

Jie J., 2009, P ICASSP 2012 IEEE, P4833

[10]

Joachims T., 1998, 24 LS8

← 1 2 3 →