MMPF: Multimodal Purification Fusion for Automatic Depression Detection

被引:4
作者
Yang, Biao [1 ]
Cao, Miaomiao [2 ]
Zhu, Xianlin [3 ]
Wang, Suhong [3 ]
Yang, Changchun [1 ]
Ni, Rongrong [1 ]
Liu, Xiaofeng [4 ]
机构
[1] Changzhou Univ, Sch Microelect & Control Engn, Changzhou 213000, Jiangsu, Peoples R China
[2] Changzhou Univ, Sch Comp Sci & Artificial Intelligence, Changzhou 213000, Jiangsu, Peoples R China
[3] Soochow Univ, Clin Psychol, Affiliated Hosp 3, Changzhou 213000, Jiangsu, Peoples R China
[4] Hohai Univ, Sch Internet Things Engn, Changzhou 213000, Jiangsu, Peoples R China
来源
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 06期
关键词
Automatic depression detection (ADD); contrastive learning; dynamic corrective learning (DCL); multimodal fusion; neural network;
D O I
10.1109/TCSS.2024.3411616
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Depression is a common mental disorder that requires objective and valid assessment tools. However, purely data-driven methods cannot satisfy the clinical diagnostic criteria for automatic depression detection (ADD), and the instability and heterogeneity of multimodal data have not been fully resolved. Therefore, we propose a novel auxiliary tool for ADD based on multimodal purification fusion (MMPF). Initially, a prior constraint gating (PCG) strategy is used to inject doctors' constraints into depression data to guide and constrain the learning process. Then, we introduce text and audio encoders to extract unpurified features from preprocessed depression data. Afterward, multimodal purification refinement is proposed to extract unintersected common and specific features from unpurified features, generating purified features. Meanwhile, we leverage a multiperspective contrastive learning (MCL) strategy to enhance unpurified and purified features. Finally, modality interaction (MI) based on the transformer is proposed to conduct multimodal fusion. A dynamic corrective learning (DCL) strategy is introduced to tackle modality imbalances and inconsistent sentiment. MMPF is evaluated on the Distress Analysis Interview Corpus Wizard of Oz and performs promisingly in unimodal and multimodal depression detection, indicating its significant role in ADD.
引用
收藏
页码:7421 / 7434
页数:14
相关论文
共 44 条
[1]   Effectiveness of Voice Quality Features in Detecting Depression [J].
Afshan, Amber ;
Guo, Jinxi ;
Park, Soo Jin ;
Ravi, Vijay ;
Flint, Jonathan ;
Alwan, Abeer .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :1676-1680
[2]  
Alghowinem S, 2013, INT CONF ACOUST SPEE, P8022, DOI 10.1109/ICASSP.2013.6639227
[3]   Detecting Depression with Audio/Text Sequence Modeling of Interviews [J].
Alhanai, Tuka ;
Ghassemi, Mohammad ;
Glass, James .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :1716-1720
[4]  
[Anonymous], 2018, Depression: Key facts
[5]  
Brundage M, 2020, Arxiv, DOI [arXiv:2004.07213, DOI 10.48550/ARXIV.2004.07213]
[6]   It's Just a Matter of Time: Detecting Depression with Time-Enriched Multimodal Transformers [J].
Bucur, Ana-Maria ;
Cosma, Adrian ;
Rosso, Paolo ;
Dinu, Liviu P. .
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I, 2023, 13980 :200-215
[7]  
Campbell L. Doc io-Fernandez, 2022, IberSPEECH 2022, P86
[8]   SpeechFormer plus plus : A Hierarchical Efficient Framework for Paralinguistic Speech Processing [J].
Chen, Weidong ;
Xing, Xiaofen ;
Xu, Xiangmin ;
Pang, Jianxin ;
Du, Lan .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 :775-788
[9]  
Cummins N, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P3008
[10]  
Dinkel H, 2020, Arxiv, DOI [arXiv:1904.05154, 10.48550/arXiv.1904.05154]