SPEECH-BASED DEPRESSION PREDICTION USING ENCODER-WEIGHT-ONLY TRANSFER LEARNING AND A LARGE CORPUS

被引：17

作者：

Harati, Amir ^{[1
]}

Shriberg, Elizabeth ^{[1
]}

Rutowski, Tomasz ^{[1
]}

Chlebek, Piotr ^{[1
]}

Lu, Yang ^{[1
]}

Oliveira, Ricardo ^{[1
]}

机构：

[1] Ellipsis Hlth, San Francisco, CA 94102 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

transfer learning; encoder/decoder; depression; behavioral health; mental health; EMOTION RECOGNITION;

D O I：

10.1109/ICASSP39728.2021.9414208

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech-based algorithms have gained interest for the management of behavioral health conditions such as depression. We explore a speech-based transfer learning approach that uses a lightweight encoder and that transfers only the encoder weights, enabling a simplified run-time model. Our study uses a large data set containing roughly two orders of magnitude more speakers and sessions than used in prior work. The large data set enables reliable estimation of improvement from transfer learning. Results for the prediction of PHQ-8 labels show up to 27% relative performance gains for binary classification; these gains are statistically significant with a p-value close to zero. Improvements were also found for regression. Additionally, the gain from transfer learning does not appear to require strong source task performance. Results suggest that this approach is flexible and offers promise for efficient implementation.

引用

页码：7273 / 7277

页数：5

共 39 条

[1] Detecting Depression with Audio/Text Sequence Modeling of Interviews [J].

Alhanai, Tuka ;

Ghassemi, Mohammad ;

Glass, James .

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :1716-1720

[2]

American Psychiatric Association, 2022, Diagnostic and statistical manual of mental disorders, DOI DOI 10.1176/APPI.BOOKS.9780890425596

[3]

[Anonymous], Beyond Blue | 24/7 Support for Anxiety, Depression and Suicide Prevention

[4]

[Anonymous], [No title captured]

[5]

Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621

[6]

Cohn J. F., 2018, HDB MULTIMODAL MULTI, V2, P375

[7]

Coppersmith Glen, 2015, P 2 WORKSH COMP LING, P31, DOI 10.3115/v1/W15-1204

[8] A review of depression and suicide risk assessment using speech analysis [J].

Cummins, Nicholas ;

Scherer, Stefan ;

Krajewski, Jarek ;

Schnieder, Sebastian ;

Epps, Julien ;

Quatieri, Thomas F. .

SPEECH COMMUNICATION, 2015, 71 :10-49

[9] COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH [J].

DELONG, ER ;

DELONG, DM ;

CLARKEPEARSON, DI .

BIOMETRICS, 1988, 44 (03) :837-845

[10] Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition [J].

Deng, Jun ;

Zhang, Zixing ;

Marchi, Erik ;

Schuller, Bjoern .

2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, :511-516

← 1 2 3 4 →