Facial Expression Recognition Based on Deep Spatio-Temporal Attention Network

被引：0

作者：

Li, Shuqin ^{[1
,2
]}

Zheng, Xiangwei ^{[1
,2
]}

Zhang, Xia ^{[3
]}

Chen, Xuanchi ^{[1
,2
]}

Li, Wei ^{[4
]}

机构：

[1] Shandong Normal Univ, Sch Informat Sci & Engn, Jinan, Peoples R China

[2] Shandong Prov Key Lab Distributed Comp Software N, Jinan, Peoples R China

[3] Taian City Cent Hosp, Internet Diag & Treatment Ctr, Tai An, Shandong, Peoples R China

[4] Shandong Normal Univ Lib, Shandong Normal Univ, Jinan, Peoples R China

来源：

COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT II | 2022年 / 461卷

关键词：

Facial expression recognition; Spatio-temporal features; Deep attention network;

D O I：

10.1007/978-3-031-24386-8_28

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Facial expression recognition is extremely critical in the process of human-computer interaction. Existing facial expression recognition tends to focus on a single feature of the face and does not take full advantage of the integrated spatio-temporal features of facial expression images. Therefore, this paper proposes a facial expression recognition based on a deep spatio-temporal attention network (STANER) to capture the spatio-temporal features of facial expressions when they change subtly. A facial expression recognition with an attention module based on spatial global features (SGAER) is created firstly, where the addition of the attention module is able to quantify the importance of each part of the expression feature map and thus extract the spatial global appearance features at the time of subtle expression changes from a single frame expression image. Then, facial expression recognition with C-LSTM based on temporal local features (TLER) is built to process image sequences of facial regions linked to expression creation and extract dynamic local temporal information about expressions. Experiments are carried out on CK+ and Oulu-CASIA datasets. The results showed that STANER can achieve better performance with the accuracy rates of 98.23% and 89.52% on the two mainstream datasets, respectively.

引用

页码：516 / 532

页数：17

共 41 条

[1]

[Anonymous], 2010, 2010 IEEE COMPUTER S, DOI [10. 1109/CVPRW.2010.5543262, DOI 10.1109/CVPRW.2010.5543262]

[2] Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction [J].

Chen, Luefeng ;

Zhou, Mengtian ;

Su, Wanjuan ;

Wu, Min ;

She, Jinhua ;

Hirota, Kaoru .

INFORMATION SCIENCES, 2018, 428 :49-61

[3] SUPPORT-VECTOR NETWORKS [J].

CORTES, C ;

VAPNIK, V .

MACHINE LEARNING, 1995, 20 (03) :273-297

[4] cGAN Based Facial Expression Recognition for Human-Robot Interaction [J].

Deng, Jia ;

Pang, Gaoyang ;

Zhang, Zhiyu ;

Pang, Zhibo ;

Yang, Huayong ;

Yang, Geng .

IEEE ACCESS, 2019, 7 :9848-9859

[5]

Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878

[6] CONSTANTS ACROSS CULTURES IN FACE AND EMOTION [J].

EKMAN, P ;

FRIESEN, WV .

JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1971, 17 (02) :124-&

[7]

Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947

[8]

Happy SL, 2015, 2015 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), P67

[9] Automatic Facial Expression Recognition Using Features of Salient Facial Patches [J].

Happy, S. L. ;

Routray, Aurobinda .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2015, 6 (01) :1-12

[10] A fast learning algorithm for deep belief nets [J].

Hinton, Geoffrey E. ;

Osindero, Simon ;

Teh, Yee-Whye .

NEURAL COMPUTATION, 2006, 18 (07) :1527-1554

← 1 2 3 4 5 →