Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism

被引：613

作者：

Li, Yong ^{[1
,2
]}

Zeng, Jiabei ^{[1
]}

Shan, Shiguang ^{[3
,4
]}

Chen, Xilin ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Chinese Acad Sci, Inst Comp Technol, Ctr Excellence Brain Sci & Intelligence Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China

[4] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2019年 / 28卷 / 05期

基金：

国家重点研发计划;

关键词：

Facial expression recognition; occlusion; CNN with attention mechanism; gate unit; FACE RECOGNITION;

D O I：

10.1109/TIP.2018.2886767

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Facial expression recognition in the wild is challenging due to various unconstrained conditions. Although existing facial expression classifiers have been almost perfect on analyzing constrained frontal faces, they fail to perform well on partially occluded faces that are common in the wild. In this paper, we propose a convolution neutral network (CNN) with attention mechanism (ACNN) that can perceive the occlusion regions of the face and focus on the most discriminative un-occluded regions. ACNN is an end-to-end learning framework. It combines the multiple representations from facial regions of interest (ROIs). Each representation is weighed via a proposed gate unit that computes an adaptive weight from the region itself according to the unobstructedness and importance. Considering different RoIs, we introduce two versions of ACNN: patch-based ACNN (pACNN) and global-local-based ACNN (gACNN). pACNN only pays attention to local facial patches. gACNN integrates local representations at patch-level with global representation at image-level. The proposed ACNNs are evaluated on both real and synthetic occlusions, including a self-collected facial expression dataset with real-world occlusions, the two largest in-the-wild facial expression datasets (RAF-DB and AffectNet) and their modifications with synthesized facial occlusions. Experimental results show that ACNNs improve the recognition accuracy on both the non-occluded faces and occluded faces. Visualization results demonstrate that, compared with the CNN without Gate Unit, ACNNs are capable of shifting the attention from the occluded patches to other related but unobstructed ones. ACNNs also outperform other state-of-the-art methods on several widely used in-the-lab facial expression datasets under the cross-dataset evaluation protocol.

引用

页码：2439 / 2450

页数：12

共 55 条

[1]

[Anonymous], 2017, EmotioNet Challenge: Recognition of facial expressions of emotion in the wild

[2]

[Anonymous], P 3 INT C LEARNING R

[3]

[Anonymous], FACIAL EXPRESSION AN

[4]

[Anonymous], LEARNING DISENTANGLI

[5]

[Anonymous], 2017, P IEEE C COMP VIS PA

[6]

[Anonymous], PROC CVPR IEEE

[7]

[Anonymous], INT J COMPUT VIS

[8]

[Anonymous], ACM T GRAPH

[9]

[Anonymous], P 8 IEEE INT C AUT F

[10]

[Anonymous], 2016, GRAD CAM VISUAL EXPL

← 1 2 3 4 5 6 →