Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

被引：12

作者：

Korzekwa, Daniel ^{[1
]}

Barra-Chicote, Roberto ^{[1
]}

Kostek, Bozena ^{[2
]}

Drugman, Thomas ^{[1
]}

Lajszczak, Mateusz ^{[1
]}

机构：

[1] Amazon TTS Res, Cambridge, England

[2] Gdansk Univ Technol, Fac ETI, Gdansk, Poland

来源：

INTERSPEECH 2019 | 2019年

关键词：

dysarthria detection; speech recognition; speech synthesis; interpretable deep learning models;

D O I：

10.21437/Interspeech.2019-1206

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not provide interpretable outputs. On the contrary, we show that this latent space successfully encodes interpretable characteristics of dysarthria, is effective at detecting dysarthria, and that manipulation of the latent space allows the model to reconstruct healthy speech from dysarthric speech. This work can help patients and speech pathologists to improve their understanding of the condition, lead to more accurate diagnoses and aid in reconstructing healthy speech for afflicted patients.

引用

页码：3890 / 3894

页数：5

共 37 条

[1] Alzheimersresearchuk, 2015, ON 3 PEOPL BORN 2015
[2] [Anonymous], 2017, CORR
[3] [Anonymous], 2016, MXNET FLEXIBLE EFFIC, DOI DOI 10.1007/S10461-015-1101-3
[4] [Anonymous], 2017, CONTROLLABLE TEXT GE
[5] [Anonymous], 2018, CoRR
[6] [Anonymous], 2017, TACOTRON AG FULLY EN, DOI DOI 10.1145/3108140
[7] ASHA, 2018, AM SPEECH LANG HEAR
[8] Banovic Silva, 2018, Mater Sociomed, V30, P221, DOI 10.5455/msm.2018.30.221-224
[9] Bowman Samuel R, 2015, P 20 SIGNLL C COMP N, DOI DOI 10.18653/V1/K16-1002
[10] Carmichael J., 2008, P ANN C INT SPEECH C

← 1 2 3 4 →