The Detection of Depression Using Multimodal Models Based on Text and Voice Quality Features

被引:13
|
作者
Solieman, Hanadi [1 ]
Pustozerov, Evgenii A. [1 ,2 ]
机构
[1] St Petersburg Electrotech Univ LETI, St Petersburg, Russia
[2] Almazov Natl Med Res Ctr, St Petersburg, Russia
来源
PROCEEDINGS OF THE 2021 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (ELCONRUS) | 2021年
关键词
Depression; Deep Learning; text analysis; voice quality; semi-contextual; word-level; speaker-independent; DAICWOZ; CLASSIFICATION;
D O I
10.1109/ElConRus51938.2021.9396540
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The article proves the concept that an automatic diagnosis of depression can be achieved using audio recordings of the individuals' voices. DAIC-WOZ database was used as a data source. Audio and textual data were preprocessed and converted to a set of optimized parameters for two models. Appropriate Deep Learning models to detect depression in the transcripts of the audio recordings and voice quality features, were utilized. We created a text analysis model on a word-level using Natural Language Processing (NLP) techniques, and a voice quality analysis model on tense to breathy dimension. The text analysis model made its best performance with an Fl-score equal to 0.8 (0.42) for non-depressed (depressed) individuals, while the voice quality model scored 0.76 (0.38). As a result, we had two models that would be implemented in a system for the diagnosis of depression.
引用
收藏
页码:1843 / 1848
页数:6
相关论文
共 50 条
  • [31] Scene Text Detection based on Structural Features
    Nguyen, Khanh
    Ngo Duc Thanh
    2016 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS, AND ITS APPLICATIONS (IC3INA) - RECENT PROGRESS IN COMPUTER, CONTROL, AND INFORMATICS FOR DATA SCIENCE, 2016, : 48 - 53
  • [32] Face detection using multimodal density models
    Yang, MH
    Kriegman, D
    Ahuja, N
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2001, 84 (02) : 264 - 284
  • [33] Multimodal Sentiment Analysis using Audio and Text for Crime Detection
    Boukabous, Mohammed
    Azizi, Mostafa
    2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 803 - 807
  • [34] Analyzing Multimodal Features of Spontaneous Voice Assistant Commands for Mild Cognitive Impairment Detection
    Lin, Nana
    Zhu, Youxiang
    Liang, Xiaohui
    Batsis, John A.
    Summerour, Caroline
    INTERSPEECH 2024, 2024, : 3030 - 3034
  • [35] Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis
    Haderlein, Tino
    Schwemmle, Cornelia
    Doellinger, Michael
    Matousek, Vaclav
    Ptok, Martin
    Noeth, Elmar
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2015, 2015
  • [36] A Multimodal Investigation of Speech, Text, Cognitive and Facial Video Features for Characterizing Depression With and Without Medication
    Neumann, Michael
    Kothare, Hardik
    Habberstad, Doug
    Ramanarayanan, Vikram
    INTERSPEECH 2023, 2023, : 1219 - 1223
  • [37] Snooker Video Event Detection Using Multimodal Features
    Yu, Junqing
    Huang, Yixin
    He, Yunfeng
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT ANALYSIS IN SPORTS (MMSPORTS'18), 2018, : 3 - 10
  • [38] Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features
    Zheng, Huadi
    Cai, Weicheng
    Zhou, Tianyan
    Zhang, Shilei
    Li, Ming
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2872 - 2877
  • [39] Multimodal Depression Detection Using Task-oriented Transformer-based Embedding
    Rasipuram, Sowmya
    Bhat, Junaid Hamid
    Maitra, Anutosh
    Shaw, Bishal
    Saha, Sriparna
    2022 27TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2022), 2022,
  • [40] Robust Outdoor Text Detection Using Text Intensity and Shape Features
    Liu, Zongyi
    Sarkar, Sudeep
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1130 - +