DYSARTHRIC SPEECH RECOGNITION WITH LATTICE-FREE MMI

被引:0
作者
Hermann, Enno [1 ,2 ]
Magimai-Doss, Mathew [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne EPFL, Lausanne, Switzerland
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
欧盟地平线“2020”;
关键词
Speech recognition; pathological speech processing; dysarthria; LF-MMI; ASR;
D O I
10.1109/icassp40776.2020.9053549
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recognising dysarthric speech is a challenging problem as it differs in many aspects from typical speech, such as speaking rate and pronunciation. In the literature the focus so far has largely been on handling these variabilities in the framework of HMM/GMM and cross-entropy based HMM/DNN systems. This paper focuses on the use of state-of-the-art sequence-discriminative training, in particular lattice-free maximum mutual information (LF-MMI), for improving dysarthric speech recognition. Through a systematic investigation on the Torgo corpus we demonstrate that LF-MMI performs well on such atypical data and compensates much better for the low speaking rates of dysarthric speakers than conventionally trained systems. This can be attributed to inherent aspects of current speech recognition training regimes, like frame subsampling and speed perturbation, which obviate the need for some techniques previously adopted specifically for dysarthric speech.
引用
收藏
页码:6109 / 6113
页数:5
相关论文
共 50 条
  • [41] Transfer Learning Using Whisper for Dysarthric Automatic Speech Recognition
    Rathod, Siddharth
    Charola, Monil
    Patil, Hemant A.
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 579 - 589
  • [42] Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System
    Shahamiri, Seyed Reza
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2021, 29 : 852 - 861
  • [43] Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition
    T. A. Mariya Celin
    P. Vijayalakshmi
    T. Nagarajan
    Circuits, Systems, and Signal Processing, 2023, 42 : 601 - 622
  • [44] Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition
    Celin, T. A. Mariya
    Vijayalakshmi, P.
    Nagarajan, T.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (01) : 601 - 622
  • [45] IDEA: AN ITALIAN DYSARTHRIC SPEECH DATABASE
    Marini, Marco
    Vigano, Mauro
    Corbo, Massimo
    Zettin, Marina
    Simoncini, Gloria
    Fattori, Bruno
    D'Anna, Clelia
    Donati, Massimiliano
    Fanucci, Luca
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 1086 - 1093
  • [46] FAST LATTICE-FREE KEYWORD FILTERING FOR ACCELERATED SPOKEN TERM DETECTION
    Wintrode, Jonathan
    Wilkes, Jenny
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7469 - 7473
  • [47] EasyCall corpus: a dysarthric speech dataset
    Turrisi, Rosanna
    Braccia, Arianna
    Emanuele, Marco
    Giulietti, Simone
    Pugliatti, Maura
    Sensi, Mariachiara
    Fadiga, Luciano
    Badino, Leonardo
    INTERSPEECH 2021, 2021, : 41 - 45
  • [48] Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR
    Michel, Wilfried
    Schlueter, Ralf
    Ney, Hermann
    INTERSPEECH 2019, 2019, : 1601 - 1605
  • [49] DNN Acoustic Models for Dysarthric Speech
    Tejaswi, Seeram
    Umesh, S.
    2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2017,
  • [50] Dysarthric Speech Recognition Using Dysarthria-Severity-Dependent and Speaker-Adaptive Models
    Kim, Myung Jong
    Yoo, Joohong
    Kim, Hoirin
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3589 - 3593