DYSARTHRIC SPEECH RECOGNITION WITH LATTICE-FREE MMI

被引:0
|
作者
Hermann, Enno [1 ,2 ]
Magimai-Doss, Mathew [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne EPFL, Lausanne, Switzerland
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
欧盟地平线“2020”;
关键词
Speech recognition; pathological speech processing; dysarthria; LF-MMI; ASR;
D O I
10.1109/icassp40776.2020.9053549
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recognising dysarthric speech is a challenging problem as it differs in many aspects from typical speech, such as speaking rate and pronunciation. In the literature the focus so far has largely been on handling these variabilities in the framework of HMM/GMM and cross-entropy based HMM/DNN systems. This paper focuses on the use of state-of-the-art sequence-discriminative training, in particular lattice-free maximum mutual information (LF-MMI), for improving dysarthric speech recognition. Through a systematic investigation on the Torgo corpus we demonstrate that LF-MMI performs well on such atypical data and compensates much better for the low speaking rates of dysarthric speakers than conventionally trained systems. This can be attributed to inherent aspects of current speech recognition training regimes, like frame subsampling and speed perturbation, which obviate the need for some techniques previously adopted specifically for dysarthric speech.
引用
收藏
页码:6109 / 6113
页数:5
相关论文
共 50 条
  • [1] Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems
    Madikeri, Srikanth
    Khonglah, Banriskhem K.
    Tong, Sibo
    Motlicek, Petr
    Bourlard, Herve
    Povey, Daniel
    INTERSPEECH 2020, 2020, : 4746 - 4750
  • [2] SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI
    Manohar, Vimal
    Hadian, Hossein
    Povey, Daniel
    Khudanpur, Sanjeev
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4844 - 4848
  • [3] LATTICE-FREE MMI ADAPTATION OF SELF-SUPERVISED PRETRAINED ACOUSTIC MODELS
    Vyas, ApoorV
    Madikeri, Srikanth
    Bourlard, Herve
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6219 - 6223
  • [4] Optimization of dysarthric speech recognition
    Chen, FX
    Kostov, A
    PROCEEDINGS OF THE 19TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOL 19, PTS 1-6: MAGNIFICENT MILESTONES AND EMERGING OPPORTUNITIES IN MEDICAL ENGINEERING, 1997, 19 : 1436 - 1439
  • [5] SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION
    Soleymanpour, Mohammad
    Johnson, Michael T.
    Soleymanpour, Rahim
    Berry, Jeffrey
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7382 - 7386
  • [6] Optimizing Vocabulary Modeling for Dysarthric Speech Recognition
    Na, Minsoo
    Chung, Minhwa
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, PT II (ICCHP 2016), 2016, 9759 : 507 - 510
  • [7] Using articulatory likelihoods in the recognition of dysarthric speech
    Rudzicz, Frank
    SPEECH COMMUNICATION, 2012, 54 (03) : 430 - 444
  • [8] Using speech rhythm knowledge to improve dysarthric speech recognition
    Selouani, S. -A.
    Dahmani, H.
    Amami, R.
    Hamam, H.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (01) : 57 - 64
  • [9] Using speech rhythm knowledge to improve dysarthric speech recognition
    S.-A. Selouani
    H. Dahmani
    R. Amami
    H. Hamam
    International Journal of Speech Technology, 2012, 15 (1) : 57 - 64
  • [10] Dysarthric speech: A comparison of computerized speech recognition and listener intelligibility
    Doyle, PC
    Leeper, HA
    Kotler, AL
    ThomasStonell, N
    ONeill, C
    Dylke, MC
    Rolls, K
    JOURNAL OF REHABILITATION RESEARCH AND DEVELOPMENT, 1997, 34 (03): : 309 - 316