INTEGRATING ARTICULATORY FEATURES INTO ACOUSTIC-PHONEMIC MODEL FOR MISPRONUNCIATION DETECTION AND DIAGNOSIS IN L2 ENGLISH SPEECH

被引:0
作者
Mao, Shaoguang [1 ]
Wu, Zhiyong [1 ,2 ]
Li, Xu [2 ]
Li, Runnan [1 ]
Wu, Xixin [2 ]
Meng, Helen [1 ,2 ]
机构
[1] Tsinghua Univ, Grad Sch Shenzhen, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Beijing, Peoples R China
[2] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) | 2018年
关键词
Computer-aided pronunciation training; mispronunciation detection and diagnosis; articulatory features; acoustic-phonemic model; multi-task learning;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper proposes novel approaches to mispronunciation detection and diagnosis (MDD) on second-language (L2) learners' speech with articulatory features. Here, articulatory features are the positions of articulators when pronouncing phonemes and reflect the pronunciation mechanisms of each phoneme. The use of articulatory features in MDD is helpful in distinguishing phonemes. Three models with articulatory features are proposed based on acoustic-phonemic model (APM): 1) articulatory-acoustic-phonemic model (AAPM) that embeds articulatory features directly into input features; 2) AAPM with feature representation (R-AAPM) to re-represent original input features with articulatory features; and 3) articulatory multi-task acoustic-phonemic model (A-MT-APM) where phoneme recognizer and articulatory feature classifiers are trained simultaneously in multi-task manner. Compared with baseline phoneme-based APM, proposed approaches perform better in mispronunciation detection and diagnosis measured with Precision, Recall and F1-Measure metrics. Specifically, the A-MT-APM approach gains 5.6% and 7.0% improvement in F1-Measure and diagnostic accuracy respectively. The contributions include: 1) introducing the articulatory features to MDD in deep learning framework; 2) investigating several model architectures for better exploiting articulatory features.
引用
收藏
页数:6
相关论文
共 24 条
[1]  
[Anonymous], 2006, COURSE PHONETICS
[2]  
Caruana R, 1998, LEARNING TO LEARN, P95, DOI 10.1007/978-1-4615-5529-2_5
[3]  
Franco H., 1999, 6 EUR C SPEECH COMM, P851, DOI [10.21437/Eurospeech.1999-207, DOI 10.21437/EUROSPEECH.1999-207]
[4]  
Harrison AlissaM., 2009, INT WORKSH SPEECH LA, P45
[5]  
Jo Chul-Ho, 1998, 5 INT C SPOK LANG PR
[6]   Deep learning [J].
LeCun, Yann ;
Bengio, Yoshua ;
Hinton, Geoffrey .
NATURE, 2015, 521 (7553) :436-444
[7]  
Lee Ann., 2014, INTERSPEECH, P2877
[8]   Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks [J].
Li, Kun ;
Qian, Xiaojun ;
Meng, Helen .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) :193-207
[9]   Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-guided and Data-driven Decision Trees [J].
Li, Wei ;
Li, Kehuang ;
Siniscalchi, Sabato Marco ;
Chen, Nancy F. ;
Lee, Chin-Hui .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3127-3131
[10]  
Lo WK, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P765