Mispronunciation Detection and Diagnosis in L2 English Speech Using Multi-Distribution Deep Neural Networks

被引:0
|
作者
Li, Kun [1 ]
Meng, Helen [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Human Comp Communicat Lab, Hong Kong, Peoples R China
来源
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年
关键词
speech recognition; mispronunciation detection and diagnosis; L2 English speech; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the use of multi-distribution Deep Neural Networks (DNNs) for mispronunciation detection and diagnosis (MD&D). Our existing approach uses extended recognition networks (ERNs) to constrain the recognition paths to the canonical pronunciation of the target words and the likely phonetic mispronunciations. Although this approach is viable, it has some problems: (1) deriving appropriate phonological rules to generate the ERNs remains a challenging task; (2) the acoustic model (AM) and the phonological rules are trained independently and hence contextual information is lost; and (3) phones missing from the ERNs cannot be recognized even if we have a well-trained AM. Hence we propose an Acoustic Phonological Model (APM) using a multi-distribution DNN, whose input features include acoustic features and corresponding canonical pronunciations. The APM can implicitly learn the phonological rules from the canonical productions and annotated mispronunciations in the training data. Furthermore, the APM can also capture the relationships between the phonological rules and related acoustic features. As we do not restrict any pathways as in the ERNs, all phones can be recognized if we have a perfect APM. Experiments show that our method achieves an accuracy of 83.3% and a correctness of 88.5%. It significantly outperforms the approach of forced-alignment with ERNs whose correctness is 75.9%.
引用
收藏
页码:255 / 259
页数:5
相关论文
共 50 条
  • [1] Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
    Li, Kun
    Qian, Xiaojun
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 193 - 207
  • [2] Intonation classification for L2 English speech using multi-distribution deep neural networks
    Li, Kun
    Wu, Xixin
    Meng, Helen
    COMPUTER SPEECH AND LANGUAGE, 2017, 43 : 18 - 33
  • [3] Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks
    Li, Kun
    Mao, Shaoguang
    Li, Xu
    Wu, Zhiyong
    Meng, Helen
    SPEECH COMMUNICATION, 2018, 96 : 28 - 36
  • [4] UNSUPERVISED DISCOVERY OF AN EXTENDED PHONEME SET IN L2 ENGLISH SPEECH FOR MISPRONUNCIATION DETECTION AND DIAGNOSIS
    Mao, Shaoguang
    Li, Xu
    Li, Kun
    Wu, Zhiyong
    Liu, Xunying
    Meng, Helen
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6244 - 6248
  • [5] Lexical Stress Detection for L2 English Speech Using Deep Belief Networks
    Li, Kun
    Qian, Xiaojun
    Kang, Shiyin
    Meng, Helen
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1810 - 1814
  • [6] Mispronunciation detection and diagnosis using deep neural networks: a systematic review
    Lounis, Meriem
    Dendani, Bilal
    Bahi, Halima
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 62793 - 62827
  • [7] Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis
    Li, Xu
    Mao, Shaoguang
    Wu, Xixin
    Li, Kun
    Liu, Xunying
    Meng, Helen
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2554 - 2558
  • [8] INTEGRATING ARTICULATORY FEATURES INTO ACOUSTIC-PHONEMIC MODEL FOR MISPRONUNCIATION DETECTION AND DIAGNOSIS IN L2 ENGLISH SPEECH
    Mao, Shaoguang
    Wu, Zhiyong
    Li, Xu
    Li, Runnan
    Wu, Xixin
    Meng, Helen
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [9] APPLYING MULTITASK LEARNING TO ACOUSTIC-PHONEMIC MODEL FOR MISPRONUNCIATION DETECTION AND DIAGNOSIS IN L2 ENGLISH SPEECH
    Mao, Shaoguang
    Wu, Zhiyong
    Li, Runnan
    Li, Xu
    Meng, Helen
    Cai, Lianhong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6254 - 6258
  • [10] L2-GEN: A Neural Phoneme Paraphrasing Approach to L2 Speech Synthesis for Mispronunciation Diagnosis
    Zhang, Daniel Yue
    Ganesan, Ashwinkumar
    Campbell, Sarah
    Korzekwa, Daniel
    INTERSPEECH 2022, 2022, : 4317 - 4321