Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks

被引：97

作者：

Li, Kun ^{[1
]}

Qian, Xiaojun ^{[1
]}

Meng, Helen ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2017年 / 25卷 / 01期

关键词：

Deep neural networks; L2 English speech; mispronunciation detection; mispronunciation diagnosis; speech recognition; PRONUNCIATION ERROR PATTERNS; UNSUPERVISED DISCOVERY; MODELS; REPRESENTATIONS; RECOGNITION; AGREEMENT;

D O I：

10.1109/TASLP.2016.2621675

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper investigates the use of multidistribution deep neural networks (DNNs) for mispronunciation detection and diagnosis (MDD), to circumvent the difficulties encountered in an existing approach based on extended recognition networks (ERNs). The ERNs leverage existing automatic speech recognition technology by constraining the search space via including the likely phonetic error patterns of the target words in addition to the canonical transcriptions. MDDs are achieved by comparing the recognized transcriptions with the canonical ones. Although this approach performs reasonably well, it has the following issues: 1) Learning the error patterns of the target words to generate the ERNs remains a challenging task. Phones or phone errors missing from the ERNs cannot be recognized even if we have well-trained acoustic models; and 2) acoustic models and phonological rules are trained independently, and hence, contextual information is lost. To address these issues, we propose an acoustic-graphemic-phonemic model (AGPM) using a multidistribution DNN, whose input features include acoustic features, as well as corresponding graphemes and canonical transcriptions (encoded as binary vectors). The AGPM can implicitly model both grapheme-to-likely-pronunciation and phoneme-to-likely-pronunciation conversions, which are integrated into acoustic modeling. With the AGPM, we develop a unified MDD framework, which works much like free-phone recognition. Experiments show that our method achieves a phone error rate (PER) of 11.1%. The false rejection rate (FRR), false acceptance rate (FAR), and diagnostic error rate (DER) for MDD are 4.6%, 30.5%, and 13.5%, respectively. It outperforms the ERN approach using DNNs as acoustic models, whose PER, FRR, FAR, and DER are 16.8%, 11.0%, 43.6%, and 32.3%, respectively.

引用

页码：193 / 207

页数：15

共 50 条

[41] L2 English Rhythm in Read Speech by Chinese Students
Ding, Hongwei
Xu, Xinping
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2696 - 2700
[42] Use and accuracy of verb complements in English L2 speech
Vercellotti, Mary Lou
de Jong, Nel
DUTCH JOURNAL OF APPLIED LINGUISTICS, 2013, 2 (02) : 243 - 250
[43] Rhythm and Disfluency: Interactions in Chinese L2 English Speech
Yu, Jue
Zhang, Lu
Wu, Shengyi
Zhang, Beihua
2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), 2017, : 139 - 144
[44] Analysis of forced aligner performance on L2 English speech
Williams, Samantha
Foulkes, Paul
Hughes, Vincent
SPEECH COMMUNICATION, 2024, 158
[45] Adaptive L2 control of nonlinear systems using neural networks
Huaijing QU
Singapore Institute of Manufacturing Technology
JournalofControlTheoryandApplications, 2004, (04) : 332 - 338
[46] Adaptive L2 control of nonlinear systems using neural networks
Huaijing Qu
Ying Zhang
Fengrong Sun
Journal of Control Theory and Applications, 2004, 2 (4): : 332 - 338
[47] L1 Identification from L2 Speech Using Neural Spectrogram Analysis
Graham, Calbert
INTERSPEECH 2021, 2021, : 3959 - 3963
[48] EMOTION DETECTION IN SPEECH USING DEEP NETWORKS
Amer, Mohamed R.
Siddiquie, Behjat
Richey, Colleen
Divakaran, Ajay
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[49] Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations
Guang Shi
Jiangshe Zhang
Huirong Li
Changpeng Wang
Neural Processing Letters, 2019, 50 : 57 - 75
[50] Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations
Shi, Guang
Zhang, Jiangshe
Li, Huirong
Wang, Changpeng
NEURAL PROCESSING LETTERS, 2019, 50 (01) : 57 - 75

← 1 2 3 4 5 →