Mispronunciation Detection and Diagnosis in L2 English Speech Using Multi-Distribution Deep Neural Networks

被引:0
|
作者
Li, Kun [1 ]
Meng, Helen [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Human Comp Communicat Lab, Hong Kong, Peoples R China
来源
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2014年
关键词
speech recognition; mispronunciation detection and diagnosis; L2 English speech; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the use of multi-distribution Deep Neural Networks (DNNs) for mispronunciation detection and diagnosis (MD&D). Our existing approach uses extended recognition networks (ERNs) to constrain the recognition paths to the canonical pronunciation of the target words and the likely phonetic mispronunciations. Although this approach is viable, it has some problems: (1) deriving appropriate phonological rules to generate the ERNs remains a challenging task; (2) the acoustic model (AM) and the phonological rules are trained independently and hence contextual information is lost; and (3) phones missing from the ERNs cannot be recognized even if we have a well-trained AM. Hence we propose an Acoustic Phonological Model (APM) using a multi-distribution DNN, whose input features include acoustic features and corresponding canonical pronunciations. The APM can implicitly learn the phonological rules from the canonical productions and annotated mispronunciations in the training data. Furthermore, the APM can also capture the relationships between the phonological rules and related acoustic features. As we do not restrict any pathways as in the ERNs, all phones can be recognized if we have a perfect APM. Experiments show that our method achieves an accuracy of 83.3% and a correctness of 88.5%. It significantly outperforms the approach of forced-alignment with ERNs whose correctness is 75.9%.
引用
收藏
页码:255 / 259
页数:5
相关论文
共 50 条
  • [21] Automatic Deep Neural Network-Based Segmental Pronunciation Error Detection of L2 English Speech (L1 Bengali)
    Bharati, Puja
    Chandra, Sabyasachi
    Das Mandal, Shayamal Kumar
    INTERSPEECH 2023, 2023, : 3068 - 3072
  • [22] Speech Activity Detection Using Deep Neural Networks
    Shahsavari, Sajad
    Sameti, Hossein
    Hadian, Hossein
    2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1564 - 1568
  • [23] Factorized Deep Neural Network Adaptation for Automatic Scoring of L2 Speech in English Speaking Tests
    Luo, Dean
    Zhang, Chunxiao
    Xia, Linzhong
    Wang, Lixin
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1656 - 1660
  • [24] Speech Activity Detection on YouTube Using Deep Neural Networks
    Ryant, Neville
    Liberman, Mark
    Yuan, Jiahong
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 728 - 731
  • [25] Enhanced speech emotion detection using deep neural networks
    S. Lalitha
    Shikha Tripathi
    Deepa Gupta
    International Journal of Speech Technology, 2019, 22 : 497 - 510
  • [26] Enhanced speech emotion detection using deep neural networks
    Lalitha, S.
    Tripathi, Shikha
    Gupta, Deepa
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 497 - 510
  • [27] A New Neural Network Based Logistic Regression Classifier For Improving Mispronunciation Detection of L2 Language Learners
    Hu, Wenping
    Qian, Yao
    Soong, Frank K.
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 245 - +
  • [28] Combining Speech Features for Aggression Detection Using Deep Neural Networks
    Jaafar, Noussaiba
    Lachiri, Zied
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
  • [29] Arabic Hate Speech Detection Using Deep Recurrent Neural Networks
    Al Anezi, Faisal Yousif
    APPLIED SCIENCES-BASEL, 2022, 12 (12):
  • [30] Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks
    Buker, Aykut
    Hanilci, Cemal
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (7) : 4528 - 4546