The Detection of Depression Using Multimodal Models Based on Text and Voice Quality Features

被引：13

作者：

Solieman, Hanadi ^{[1
]}

Pustozerov, Evgenii A. ^{[1
,2
]}

机构：

[1] St Petersburg Electrotech Univ LETI, St Petersburg, Russia

[2] Almazov Natl Med Res Ctr, St Petersburg, Russia

来源：

PROCEEDINGS OF THE 2021 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (ELCONRUS) | 2021年

关键词：

Depression; Deep Learning; text analysis; voice quality; semi-contextual; word-level; speaker-independent; DAICWOZ; CLASSIFICATION;

D O I：

10.1109/ElConRus51938.2021.9396540

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The article proves the concept that an automatic diagnosis of depression can be achieved using audio recordings of the individuals' voices. DAIC-WOZ database was used as a data source. Audio and textual data were preprocessed and converted to a set of optimized parameters for two models. Appropriate Deep Learning models to detect depression in the transcripts of the audio recordings and voice quality features, were utilized. We created a text analysis model on a word-level using Natural Language Processing (NLP) techniques, and a voice quality analysis model on tense to breathy dimension. The text analysis model made its best performance with an Fl-score equal to 0.8 (0.42) for non-depressed (depressed) individuals, while the voice quality model scored 0.76 (0.38). As a result, we had two models that would be implemented in a system for the diagnosis of depression.

引用

页码：1843 / 1848

页数：6

共 50 条

[31] Scene Text Detection based on Structural Features
Nguyen, Khanh
Ngo Duc Thanh
2016 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS, AND ITS APPLICATIONS (IC3INA) - RECENT PROGRESS IN COMPUTER, CONTROL, AND INFORMATICS FOR DATA SCIENCE, 2016, : 48 - 53
[32] Face detection using multimodal density models
Yang, MH
Kriegman, D
Ahuja, N
COMPUTER VISION AND IMAGE UNDERSTANDING, 2001, 84 (02) : 264 - 284
[33] Multimodal Sentiment Analysis using Audio and Text for Crime Detection
Boukabous, Mohammed
Azizi, Mostafa
2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 803 - 807
[34] Analyzing Multimodal Features of Spontaneous Voice Assistant Commands for Mild Cognitive Impairment Detection
Lin, Nana
Zhu, Youxiang
Liang, Xiaohui
Batsis, John A.
Summerour, Caroline
INTERSPEECH 2024, 2024, : 3030 - 3034
[35] Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis
Haderlein, Tino
Schwemmle, Cornelia
Doellinger, Michael
Matousek, Vaclav
Ptok, Martin
Noeth, Elmar
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2015, 2015
[36] A Multimodal Investigation of Speech, Text, Cognitive and Facial Video Features for Characterizing Depression With and Without Medication
Neumann, Michael
Kothare, Hardik
Habberstad, Doug
Ramanarayanan, Vikram
INTERSPEECH 2023, 2023, : 1219 - 1223
[37] Snooker Video Event Detection Using Multimodal Features
Yu, Junqing
Huang, Yixin
He, Yunfeng
PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT ANALYSIS IN SPORTS (MMSPORTS'18), 2018, : 3 - 10
[38] Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features
Zheng, Huadi
Cai, Weicheng
Zhou, Tianyan
Zhang, Shilei
Li, Ming
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2872 - 2877
[39] Multimodal Depression Detection Using Task-oriented Transformer-based Embedding
Rasipuram, Sowmya
Bhat, Junaid Hamid
Maitra, Anutosh
Shaw, Bishal
Saha, Sriparna
2022 27TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2022), 2022,
[40] Robust Outdoor Text Detection Using Text Intensity and Shape Features
Liu, Zongyi
Sarkar, Sudeep
19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1130 - +

← 1 2 3 4 5 →