Handling acoustic variation in dysarthric speech recognition systems through model combination

被引:2
作者
Hermann, Enno [1 ,2 ]
Magimai-Doss, Mathew [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Federate Lausanne, Lausanne, Switzerland
来源
INTERSPEECH 2021 | 2021年
关键词
speech recognition; pathological speech; dysarthria;
D O I
10.21437/Interspeech.2021-2212
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Developing automatic speech recognition (ASR) systems that recognise dysarthric speech as well as control speech from unimpaired speakers remains challenging. Including more highly variable dysarthric speech during training can also negatively affect the performance on control speakers, which is not desirable when developing speech recognisers for a wider audience. In this work, we analyse how the acoustic variability of dysarthric speech affects ASR systems and propose the combination of multiple acoustic models trained on different subsets of speakers to mitigate this effect. This approach shows improvements for both dysarthric and control speakers on the Torgo and UA-Speech corpora.
引用
收藏
页码:4788 / 4792
页数:5
相关论文
共 18 条
[1]  
[Anonymous], 2012, INT CONF ACOUST SPEE
[2]  
Bhat C, 2018, INTERSPEECH, P451
[3]  
Christensen H, 2013, INTERSPEECH, P3609
[4]   A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER) [J].
Fiscus, JG .
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, :347-354
[5]  
Fosler-Lussier E., 1999, P ASRU
[6]  
Hermann E, 2020, INT CONF ACOUST SPEE, P6109, DOI [10.1109/ICASSP40776.2020.9053549, 10.1109/icassp40776.2020.9053549]
[7]  
Jiao YS, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P6009, DOI 10.1109/ICASSP.2018.8462290
[8]  
Kamper H, 2019, INT CONF ACOUST SPEE, P6535, DOI 10.1109/ICASSP.2019.8683639
[9]  
Kim H, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P1741
[10]  
Ko T, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3586