Phonological Feature Based Mispronunciation Detection and Diagnosis using Multi-Task DNNs and Active Learning

被引:8
作者
Arora, Vipul [1 ]
Lahiri, Aditi [1 ]
Reetz, Henning [2 ]
机构
[1] Univ Oxford, Fac Linguist Philol & Phonet, Oxford, England
[2] Goethe Univ, Frankfurt, Germany
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
基金
欧洲研究理事会;
关键词
computer-aided pronunciation training; phonological features; multi-task DNNs; active learning; ACOUSTIC MODELS; SPEECH;
D O I
10.21437/Interspeech.2017-1350
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a phonological feature based computer aided pronunciation training system for the learners of a new language (L2). Phonological features allow analysing the learners' mispronunciations systematically and rendering the feedback more effectively. The proposed acoustic model consists of a multi-task deep neural network, which uses a shared representation for estimating the phonological features and HMM state probabilities. Moreover, an active learning based scheme is proposed to efficiently deal with the cost of annotation, which is done by expert teachers, by selecting the most informative samples for annotation. Experimental evaluations are carried out for German and Italian native-speakers speaking English. For mispronunciation detection, the proposed feature-based system outperforms conventional GOP measure and classifier based methods, while providing more detailed diagnosis. Evaluations also demonstrate the advantage of active learning based sampling over random sampling.
引用
收藏
页码:1432 / 1436
页数:5
相关论文
共 27 条
  • [11] Koreman J. C., 2013, WORKSH SPEECH LANG T, P172
  • [12] Kuo H.-K. J., 2005, INTERSPEECH, P437
  • [13] Lee A, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P643
  • [14] Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
    Li, Kun
    Qian, Xiaojun
    Meng, Helen
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 193 - 207
  • [15] Li W, 2016, INT CONF ACOUST SPEE, P6135, DOI 10.1109/ICASSP.2016.7472856
  • [16] DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection
    Li, Xi
    Zhao, Liming
    Wei, Lina
    Yang, Ming-Hsuan
    Wu, Fei
    Zhuang, Yueting
    Ling, Haibin
    Wang, Jingdong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (08) : 3919 - 3930
  • [17] Lo WK, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P765
  • [18] Menzel W., 2000, P LREC LANG RES EV C, V2, P957
  • [19] Neri Ambra., 2002, INTERSPEECH
  • [20] Active learning:: Theory and applications to automatic speech recognition
    Riccardi, G
    Hakkani-Tür, D
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (04): : 504 - 511