Integrating Deep Facial Priors Into Landmarks for Privacy Preserving Multimodal Depression Recognition

被引:14
作者
Pan, Yuchen [1 ,2 ]
Shang, Yuanyuan [1 ,2 ]
Shao, Zhuhong [1 ,3 ]
Liu, Tie [1 ,3 ]
Guo, Guodong [4 ]
Ding, Hui [1 ,3 ]
机构
[1] Capital Normal Univ, Coll Informat Engn, Beijing 100048, Peoples R China
[2] Beijing Key Lab Elect Syst Reliabil Technol, Beijing 100048, Peoples R China
[3] Beijing Engn Res Ctr Highly Reliable Embedded Syst, Beijing 100048, Peoples R China
[4] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
基金
中国国家自然科学基金;
关键词
Depression; Feature extraction; Face recognition; Visualization; Training; Image recognition; Deep learning; Depression recognition; multimodal; spatial-temporal attention; video recognition; APPEARANCE;
D O I
10.1109/TAFFC.2023.3296318
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic depression diagnosis is a challenging problem, that requires integrating spatial-temporal information and extracting features from audio-visual signals. In terms of privacy protection, the development trend of recognition algorithms based on facial landmarks has created additional challenges and difficulties. In this article, we propose an audio-visual attention network (AVA-DepressNet) for depression recognition. It is a novel multimodal framework with facial privacy protection, and uses attention-based modules to enhance audio-visual spatial and temporal features. In addition, an adversarial multistage (AMS) training strategy is developed to optimize the encoder-decoder structure. Additionally, facial structure prior knowledge is creatively used in AMS training. Our AVA-DepressNet is evaluated on popular audio-visual depression datasets: AVEC 2013, AVEC 2014, and AVEC 2017. The results show that our approach reaches the state-of-the-art performance or competitive results for depression recognition.
引用
收藏
页码:828 / 836
页数:9
相关论文
共 52 条
[1]   Video-Based Depression Level Analysis by Encoding Deep Spatiotemporal Features [J].
Al Jazaery, Mohamad ;
Guo, Guodong .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (01) :262-268
[2]  
Alghowinem S, 2023, IEEE T AFFECT COMPUT, V14, P133, DOI [10.1109/TAFFC.2020.3035535, 10.1109/taffc.2020.3035535]
[3]   Multimodal Depression Detection: Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors [J].
Alghowinem, Sharifa ;
Goecke, Roland ;
Wagner, Michael ;
Epps, Julien ;
Hyett, Matthew ;
Parker, Gordon ;
Breakspear, Michael .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2018, 9 (04) :478-490
[4]   Comparison of Beck Depression Inventories-IA and -II in psychiatric outpatients [J].
Beck, AT ;
Steer, RA ;
Ball, R ;
Ranieri, WF .
JOURNAL OF PERSONALITY ASSESSMENT, 1996, 67 (03) :588-597
[5]   VGGFace2: A dataset for recognising faces across pose and age [J].
Cao, Qiong ;
Shen, Li ;
Xie, Weidi ;
Parkhi, Omkar M. ;
Zisserman, Andrew .
PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :67-74
[6]  
Cholet S., 2019, IEEE IJCNN, P1, DOI [10.1109/IJCNN.2019.8852089, DOI 10.1109/ijcnn.2019.8852089]
[7]  
Cummins N., 2013, P 3 ACM INT WORKSH A, P11
[8]  
Dang T., 2017, AVEC WORKSHOP, P27
[9]  
de Melo W.C., 2020, IEEE Trans Affect Comput
[10]   MDN: A Deep Maximization-Differentiation Network for Spatio-Temporal Depression Detection [J].
de Melo, Wheidima Carneiro ;
Granger, Eric ;
Lopez, Miguel Bordallo .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) :578-590