Integrating Deep Facial Priors Into Landmarks for Privacy Preserving Multimodal Depression Recognition

被引：14

作者：

Pan, Yuchen ^{[1
,2
]}

Shang, Yuanyuan ^{[1
,2
]}

Shao, Zhuhong ^{[1
,3
]}

Liu, Tie ^{[1
,3
]}

Guo, Guodong ^{[4
]}

Ding, Hui ^{[1
,3
]}

机构：

[1] Capital Normal Univ, Coll Informat Engn, Beijing 100048, Peoples R China

[2] Beijing Key Lab Elect Syst Reliabil Technol, Beijing 100048, Peoples R China

[3] Beijing Engn Res Ctr Highly Reliable Embedded Syst, Beijing 100048, Peoples R China

[4] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA

来源：

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING | 2024年 / 15卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Depression; Feature extraction; Face recognition; Visualization; Training; Image recognition; Deep learning; Depression recognition; multimodal; spatial-temporal attention; video recognition; APPEARANCE;

D O I：

10.1109/TAFFC.2023.3296318

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic depression diagnosis is a challenging problem, that requires integrating spatial-temporal information and extracting features from audio-visual signals. In terms of privacy protection, the development trend of recognition algorithms based on facial landmarks has created additional challenges and difficulties. In this article, we propose an audio-visual attention network (AVA-DepressNet) for depression recognition. It is a novel multimodal framework with facial privacy protection, and uses attention-based modules to enhance audio-visual spatial and temporal features. In addition, an adversarial multistage (AMS) training strategy is developed to optimize the encoder-decoder structure. Additionally, facial structure prior knowledge is creatively used in AMS training. Our AVA-DepressNet is evaluated on popular audio-visual depression datasets: AVEC 2013, AVEC 2014, and AVEC 2017. The results show that our approach reaches the state-of-the-art performance or competitive results for depression recognition.

引用

页码：828 / 836

页数：9

共 52 条

[1] Video-Based Depression Level Analysis by Encoding Deep Spatiotemporal Features [J].

Al Jazaery, Mohamad ;

Guo, Guodong .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (01) :262-268

[2]

Alghowinem S, 2023, IEEE T AFFECT COMPUT, V14, P133, DOI [10.1109/TAFFC.2020.3035535, 10.1109/taffc.2020.3035535]

[3] Multimodal Depression Detection: Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors [J].

Alghowinem, Sharifa ;

Goecke, Roland ;

Wagner, Michael ;

Epps, Julien ;

Hyett, Matthew ;

Parker, Gordon ;

Breakspear, Michael .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2018, 9 (04) :478-490

[4] Comparison of Beck Depression Inventories-IA and -II in psychiatric outpatients [J].

Beck, AT ;

Steer, RA ;

Ball, R ;

Ranieri, WF .

JOURNAL OF PERSONALITY ASSESSMENT, 1996, 67 (03) :588-597

[5] VGGFace2: A dataset for recognising faces across pose and age [J].

Cao, Qiong ;

Shen, Li ;

Xie, Weidi ;

Parkhi, Omkar M. ;

Zisserman, Andrew .

PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :67-74

[6]

Cholet S., 2019, IEEE IJCNN, P1, DOI [10.1109/IJCNN.2019.8852089, DOI 10.1109/ijcnn.2019.8852089]

[7]

Cummins N., 2013, P 3 ACM INT WORKSH A, P11

[8]

Dang T., 2017, AVEC WORKSHOP, P27

[9]

de Melo W.C., 2020, IEEE Trans Affect Comput

[10] MDN: A Deep Maximization-Differentiation Network for Spatio-Temporal Depression Detection [J].

de Melo, Wheidima Carneiro ;

Granger, Eric ;

Lopez, Miguel Bordallo .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) :578-590

← 1 2 3 4 5 6 →