Phonological feature-based speech recognition system for pronunciation training in non-native language learning

被引:18
作者
Arora, Vipul [1 ]
Lahiri, Aditi [1 ]
Reetz, Henning [2 ]
机构
[1] Univ Oxford, Fac Linguist Philol & Phonet, Oxford, England
[2] Goethe Univ, Frankfurt, Germany
基金
欧洲研究理事会;
关键词
MISPRONUNCIATION DETECTION; ACOUSTIC INVARIANCE; STOP CONSONANTS; VISUAL FEEDBACK; ARTICULATION; DIAGNOSIS; FRAMEWORK; MODELS; PLACE;
D O I
10.1121/1.5017834
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The authors address the question whether phonological features can be used effectively in an automatic speech recognition (ASR) system for pronunciation training in non-native language (L2) learning. Computer-aided pronunciation training consists of two essential tasks-detecting mispronunciations and providing corrective feedback, usually either on the basis of full words or phonemes. Phonemes, however, can be further disassembled into phonological features, which in turn define groups of phonemes. A phonological feature-based ASR system allows the authors to perform a sub-phonemic analysis at feature level, providing a more effective feedback to reach the acoustic goal and perceptual constancy. Furthermore, phonological features provide a structured way for analysing the types of errors a learner makes, and can readily convey which pronunciations need improvement. This paper presents the authors implementation of such an ASR system using deep neural networks as an acoustic model, and its use for detecting mispronunciations, analysing errors, and rendering corrective feedback. Quantitative as well as qualitative evaluations are carried out for German and Italian learners of English. In addition to achieving high accuracy of mispronunciation detection, the system also provides accurate diagnosis of errors. (C) 2018 Acoustical Society of America.
引用
收藏
页码:98 / 108
页数:11
相关论文
共 44 条
  • [1] Robust acoustic object detection
    Amit, Y
    Koloydenko, A
    Niyogi, P
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (04) : 2634 - 2648
  • [2] [Anonymous], 2011, PROC 2011 WORKSHOP A
  • [3] Phonological Feature Based Mispronunciation Detection and Diagnosis using Multi-Task DNNs and Active Learning
    Arora, Vipul
    Lahiri, Aditi
    Reetz, Henning
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1432 - 1436
  • [4] Arora V, 2016, IEEE W SP LANG TECH, P617, DOI 10.1109/SLT.2016.7846327
  • [5] ACOUSTIC INVARIANCE IN SPEECH PRODUCTION - EVIDENCE FROM MEASUREMENTS OF THE SPECTRAL CHARACTERISTICS OF STOP CONSONANTS
    BLUMSTEIN, SE
    STEVENS, KN
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 66 (04) : 1001 - 1017
  • [6] ARTICULATORY PHONOLOGY - AN OVERVIEW
    BROWMAN, CP
    GOLDSTEIN, L
    [J]. PHONETICA, 1992, 49 (3-4) : 155 - 180
  • [7] A STATISTICAL APPROACH TO AUTOMATIC SPEECH RECOGNITION USING THE ATOMIC SPEECH UNITS CONSTRUCTED FROM OVERLAPPING ARTICULATORY FEATURES
    DENG, L
    SUN, DX
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (05) : 2702 - 2719
  • [8] Franco Horacio, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P7709, DOI 10.1109/ICASSP.2014.6855100
  • [9] Combination of machine scores for automatic grading of pronunciation quality
    Franco, H
    Neumeyer, L
    Digalakis, V
    Ronen, O
    [J]. SPEECH COMMUNICATION, 2000, 30 (2-3) : 121 - 130
  • [10] Franco H., 1999, 6 EUR C SPEECH COMM, P851, DOI [10.21437/Eurospeech.1999-207, DOI 10.21437/EUROSPEECH.1999-207]