Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks

被引：97

作者：

Li, Kun ^{[1
]}

Qian, Xiaojun ^{[1
]}

Meng, Helen ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2017年 / 25卷 / 01期

关键词：

Deep neural networks; L2 English speech; mispronunciation detection; mispronunciation diagnosis; speech recognition; PRONUNCIATION ERROR PATTERNS; UNSUPERVISED DISCOVERY; MODELS; REPRESENTATIONS; RECOGNITION; AGREEMENT;

D O I：

10.1109/TASLP.2016.2621675

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper investigates the use of multidistribution deep neural networks (DNNs) for mispronunciation detection and diagnosis (MDD), to circumvent the difficulties encountered in an existing approach based on extended recognition networks (ERNs). The ERNs leverage existing automatic speech recognition technology by constraining the search space via including the likely phonetic error patterns of the target words in addition to the canonical transcriptions. MDDs are achieved by comparing the recognized transcriptions with the canonical ones. Although this approach performs reasonably well, it has the following issues: 1) Learning the error patterns of the target words to generate the ERNs remains a challenging task. Phones or phone errors missing from the ERNs cannot be recognized even if we have well-trained acoustic models; and 2) acoustic models and phonological rules are trained independently, and hence, contextual information is lost. To address these issues, we propose an acoustic-graphemic-phonemic model (AGPM) using a multidistribution DNN, whose input features include acoustic features, as well as corresponding graphemes and canonical transcriptions (encoded as binary vectors). The AGPM can implicitly model both grapheme-to-likely-pronunciation and phoneme-to-likely-pronunciation conversions, which are integrated into acoustic modeling. With the AGPM, we develop a unified MDD framework, which works much like free-phone recognition. Experiments show that our method achieves a phone error rate (PER) of 11.1%. The false rejection rate (FRR), false acceptance rate (FAR), and diagnostic error rate (DER) for MDD are 4.6%, 30.5%, and 13.5%, respectively. It outperforms the ERN approach using DNNs as acoustic models, whose PER, FRR, FAR, and DER are 16.8%, 11.0%, 43.6%, and 32.3%, respectively.

引用

页码：193 / 207

页数：15

共 50 条

[31] L2 Mispronunciation Verification Based on Acoustic Phone Embedding and Siamese Networks
Xie, Yanlu
Wang, Zhenyu
Fu, Kaiqi
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2023, 95 (07): : 921 - 931
[32] L2 Mispronunciation Verification Based on Acoustic Phone Embedding and Siamese Networks
Wang, Zhenyu
Zhang, Jinsong
Xie, Yanlu
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 444 - 448
[33] Combining Speech Features for Aggression Detection Using Deep Neural Networks
Jaafar, Noussaiba
Lachiri, Zied
2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
[34] Arabic Hate Speech Detection Using Deep Recurrent Neural Networks
Al Anezi, Faisal Yousif
APPLIED SCIENCES-BASEL, 2022, 12 (12):
[35] Speech watermarking using Deep Neural Networks
Pavlovic, Kosta
Kovacevic, Slavko
Durovic, Igor
2020 28TH TELECOMMUNICATIONS FORUM (TELFOR), 2020, : 292 - 295
[36] Double Compressed Wideband AMR Speech Detection Using Deep Neural Networks
Buker, Aykut
Hanilci, Cemal
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (7) : 4528 - 4546
[37] Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding
Ebenezer Ojo, Olumide
Ta, Thang-Hoang
Gelbukh, Alexander
Calvo, Hiram
Sidorov, Grigori
Oluwayemisi Adebanji, Olaronke
COMPUTACION Y SISTEMAS, 2022, 26 (02): : 1007 - 1013
[38] Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection
Mihalache, Serban
Burileanu, Dragos
SENSORS, 2022, 22 (03)
[39] Methods for investigation of L2 speech rhythm: Insights from the production of English speech rhythm by L2 Arabic learners
Algethami, Ghazi
Hellmuth, Sam
SECOND LANGUAGE RESEARCH, 2024, 40 (02) : 431 - 456
[40] EXPLORING NON-AUTOREGRESSIVE END-TO-END NEURAL MODELING FOR ENGLISH MISPRONUNCIATION DETECTION AND DIAGNOSIS
Wang, Hsin-Wei
Yan, Bi-Cheng
Chiu, Hsuan-Sheng
Hsu, Yung-Chang
Chen, Berlin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6817 - 6821

← 1 2 3 4 5 →