Mispronunciation Detection and Diagnosis in L2 English Speech Using Multi-Distribution Deep Neural Networks

被引：0

作者：

Li, Kun ^{[1
]}

Meng, Helen ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Human Comp Communicat Lab, Hong Kong, Peoples R China

来源：

2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年

关键词：

speech recognition; mispronunciation detection and diagnosis; L2 English speech; deep neural networks;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper investigates the use of multi-distribution Deep Neural Networks (DNNs) for mispronunciation detection and diagnosis (MD&D). Our existing approach uses extended recognition networks (ERNs) to constrain the recognition paths to the canonical pronunciation of the target words and the likely phonetic mispronunciations. Although this approach is viable, it has some problems: (1) deriving appropriate phonological rules to generate the ERNs remains a challenging task; (2) the acoustic model (AM) and the phonological rules are trained independently and hence contextual information is lost; and (3) phones missing from the ERNs cannot be recognized even if we have a well-trained AM. Hence we propose an Acoustic Phonological Model (APM) using a multi-distribution DNN, whose input features include acoustic features and corresponding canonical pronunciations. The APM can implicitly learn the phonological rules from the canonical productions and annotated mispronunciations in the training data. Furthermore, the APM can also capture the relationships between the phonological rules and related acoustic features. As we do not restrict any pathways as in the ERNs, all phones can be recognized if we have a perfect APM. Experiments show that our method achieves an accuracy of 83.3% and a correctness of 88.5%. It significantly outperforms the approach of forced-alignment with ERNs whose correctness is 75.9%.

引用

页码：255 / 259

页数：5

共 50 条

[1] Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
Li, Kun
Qian, Xiaojun
Meng, Helen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 193 - 207
[2] Intonation classification for L2 English speech using multi-distribution deep neural networks
Li, Kun
Wu, Xixin
Meng, Helen
COMPUTER SPEECH AND LANGUAGE, 2017, 43 : 18 - 33
[3] Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks
Li, Kun
Mao, Shaoguang
Li, Xu
Wu, Zhiyong
Meng, Helen
SPEECH COMMUNICATION, 2018, 96 : 28 - 36
[4] UNSUPERVISED DISCOVERY OF AN EXTENDED PHONEME SET IN L2 ENGLISH SPEECH FOR MISPRONUNCIATION DETECTION AND DIAGNOSIS
Mao, Shaoguang
Li, Xu
Li, Kun
Wu, Zhiyong
Liu, Xunying
Meng, Helen
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6244 - 6248
[5] Lexical Stress Detection for L2 English Speech Using Deep Belief Networks
Li, Kun
Qian, Xiaojun
Kang, Shiyin
Meng, Helen
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1810 - 1814
[6] Mispronunciation detection and diagnosis using deep neural networks: a systematic review
Lounis, Meriem
Dendani, Bilal
Bahi, Halima
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 62793 - 62827
[7] Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis
Li, Xu
Mao, Shaoguang
Wu, Xixin
Li, Kun
Liu, Xunying
Meng, Helen
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2554 - 2558
[8] INTEGRATING ARTICULATORY FEATURES INTO ACOUSTIC-PHONEMIC MODEL FOR MISPRONUNCIATION DETECTION AND DIAGNOSIS IN L2 ENGLISH SPEECH
Mao, Shaoguang
Wu, Zhiyong
Li, Xu
Li, Runnan
Wu, Xixin
Meng, Helen
2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
[9] APPLYING MULTITASK LEARNING TO ACOUSTIC-PHONEMIC MODEL FOR MISPRONUNCIATION DETECTION AND DIAGNOSIS IN L2 ENGLISH SPEECH
Mao, Shaoguang
Wu, Zhiyong
Li, Runnan
Li, Xu
Meng, Helen
Cai, Lianhong
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6254 - 6258
[10] L2-GEN: A Neural Phoneme Paraphrasing Approach to L2 Speech Synthesis for Mispronunciation Diagnosis
Zhang, Daniel Yue
Ganesan, Ashwinkumar
Campbell, Sarah
Korzekwa, Daniel
INTERSPEECH 2022, 2022, : 4317 - 4321

← 1 2 3 4 5 →