Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions

被引：4

作者：

Hsu, Yao-Chi ^{[1
]}

Yang, Min-Han ^{[1
]}

Hung, Hsiao-Tsung ^{[1
]}

Chen, Berlin ^{[1
]}

机构：

[1] Natl Taiwan Normal Univ, Dept Comp Sci & Informat Engn, Taipei, Taiwan

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

computer assisted pronunciation training; mispronunciation detection; discriminative training; deep neural networks;

D O I：

10.21437/Interspeech.2016-1602

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Mispronunciation detection is part and parcel of a computer assisted pronunciation training (CAPT) system, facilitating second-language (L2) learners to pinpoint erroneous pronunciations in a given utterance so as to improve their spoken proficiency. This paper presents a continuation of such a general line of research and the major contributions are twofold. First, we present an effective training approach that estimates the deep neural network based acoustic models involved in the mispronunciation detection process by optimizing an objective directly linked to the ultimate evaluation metric. Second, along the same vein, two disparate logistic sigmoid based decision functions with either phone- or senone-dependent parameterization are also inferred and used for enhanced mispronunciation detection. A series of experiments on a Mandarin mispronunciation detection task seem to show the performance merits of the proposed method.

引用

页码：2646 / 2650

页数：5

共 27 条

[1] Automatic Pronunciation Scoring with Score Combination by Learning to Rank and Class-Normalized DP-Based Quantization [J].

Chen, Liang-Yu ;

Jang, Jyh-Shing Roger .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) :1737-1749

[2]

Gibson M, 2006, INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, P2406

[3] Minimum Bayes-risk automatic speech recognition [J].

Goel, V ;

Byrne, WJ .

COMPUTER SPEECH AND LANGUAGE, 2000, 14 (02) :115-135

[4]

Hao H., 2012, P INTERSPEECH, P815

[5] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

[6]

Hsiung Y., 2014, P WORKSHOP ANAL LING

[7]

Hu W., 2015, P SLATE, pAn improved DNN

[8]

Hu W., 2013, P INTERSPEECH

[9]

Hu W., 2013, P ICASSP, P3230

[10] Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers [J].

Hu, Wenping ;

Qian, Yao ;

Soong, Frank K. ;

Wang, Yong .

SPEECH COMMUNICATION, 2015, 67 :154-166

← 1 2 3 →