THE IN-THE-WILD SPEECH MEDICAL CORPUS

被引:5
作者
Correia, Joana [1 ,2 ]
Teixeira, Francisco [2 ]
Botelho, Catarina [2 ]
Trancoso, Isabel [2 ]
Raj, Bhiksha [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Univ Lisbon, INESC ID, Lisbon, Portugal
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
Speech affecting diseases; pathological speech; in-the-wild; i-vectors; x-vectors; PARKINSONS-DISEASE;
D O I
10.1109/ICASSP39728.2021.9414230
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic detection of speech affecting (SA) diseases has received significant attention, particularly in clinical scenarios. However, the same task in in-the-wild conditions is often neglected, in part, due to the lack of appropriate datasets. In this work, we present the in-the-Wild Speech Medical (WSM) Corpus, a collection of in-the-wild videos, featuring subjects potentially affected by a SA disease - specifically, depression or Parkinson's disease. The WSM Corpus contains a total 928 videos, and over 131 hours of speech. Each video is accompanied by a crowdsourced annotation for perceived age/gender, and self-reported health status of the speaker. The WSM Corpus is balanced over all the labels. In this work we present a detailed description of the collection, and annotation processes of the WSM corpus. Furthermore, we present present several baseline systems for the detection of SA diseases using speech alone, thus motivating the use of this type of in-the-wild data in paralinguistic audiovisual tasks.
引用
收藏
页码:6973 / 6977
页数:5
相关论文
共 20 条
[1]  
Correia J, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P734, DOI [10.1109/ASRU46091.2019.9003754, 10.1109/asru46091.2019.9003754]
[2]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[3]   The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing [J].
Eyben, Florian ;
Scherer, Klaus R. ;
Schuller, Bjoern W. ;
Sundberg, Johan ;
Andre, Elisabeth ;
Busso, Carlos ;
Devillers, Laurence Y. ;
Epps, Julien ;
Laukka, Petri ;
Narayanan, Shrikanth S. ;
Truong, Khiet P. .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (02) :190-202
[4]  
Eyben Florian, 2010, P 18 ACM INT C MULT, P1459
[5]   ABNORMAL SPEECH ARTICULATION, PSYCHOMOTOR RETARDATION, AND SUBCORTICAL DYSFUNCTION IN MAJOR DEPRESSION [J].
FLINT, AJ ;
BLACK, SE ;
CAMPBELLTAYLOR, I ;
GAILEY, GF ;
LEVINTON, C .
JOURNAL OF PSYCHIATRIC RESEARCH, 1993, 27 (03) :309-319
[6]  
Gratch J, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3123
[7]   Identifying distinctive acoustic and spectral features in Parkinson's disease [J].
Hauptman, Yermiyahu ;
Aloni-Lavi, Ruth ;
Lapidot, Itshak ;
Gurevich, Tanya ;
Manor, Yael ;
Naor, Stav ;
Diamant, Noa ;
Opher, Irit .
INTERSPEECH 2019, 2019, :2498-2502
[8]   Parkinson's disease: clinical features and diagnosis [J].
Jankovic, J. .
JOURNAL OF NEUROLOGY NEUROSURGERY AND PSYCHIATRY, 2008, 79 (04) :368-376
[9]   Prosodic Analysis of Speech and the Underlying Mental State [J].
Kliper, Roi ;
Portuguese, Shirley ;
Weinshall, Daphna .
PERVASIVE COMPUTING PARADIGMS FOR MENTAL HEALTH (MINDCARE 2015), 2016, 604 :52-62
[10]  
Moro-Velazquez L, 2020, INT CONF ACOUST SPEE, P1155, DOI 10.1109/ICASSP40776.2020.9053770