Emotion recognition at a distance: The robustness of machine learning based on hand-crafted facial features vs deep learning models

被引：14

作者：

Bisogni, Carmen ^{[1
]}

Cimmino, Lucia ^{[1
]}

De Marsico, Maria ^{[2
]}

Hao, Fei ^{[3
]}

Narducci, Fabio ^{[1
]}

机构：

[1] Univ Salerno, Dept Comp Sci, I-84084 Salerno, Italy

[2] Univ Roma La Sapienza, Dept Comp Sci, I-00185 Rome, Italy

[3] Shaanxi Normal Univ, Sch Comp Sci, Xian 710119, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 136卷

关键词：

Emotion recognition; Facial expression; Expression recognition at a distance; Deep learning; Machine learning; Mediapipe; EXPRESSION;

D O I：

10.1016/j.imavis.2023.104724

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Emotion estimation from face expression analysis is nowadays a widely-explored computer vision task. In turn, the classification of expressions relies on relevant facial features and their dynamics. Despite the promising accuracy results achieved in controlled and favorable conditions, the processing of faces acquired at a distance, entailing low-quality images, still suffers from a significant performance decrease. In particular, most approaches and related computational models become extremely unstable in the case of the very small amount of useful pixels that is typical in these conditions. Therefore, their behavior should be investigated more carefully. On the other hand, real-time emotion recognition at a distance may play a critical role in smart video surveillance, especially when controlling particular kinds of events, e.g., political meetings, to try to prevent adverse actions. This work compares facial expression recognition at a distance by: 1) a deep learning architecture based on stateof-the-art (SOTA) proposals, which exploits the whole images to autonomously learn the relevant embeddings; 2) a machine learning approach that relies on hand-crafted features, namely the facial landmarks preliminarily extracted using the popular Mediapipe framework. Instead of using either the complete sequence of frames or only the final still image of the expression, like current SOTA approaches, the two proposed methods are designed to use rich temporal information to identify three different stages of emotion. Expressions are time-split accordingly into four phases to better exploit their temporal-dependent dynamics. Experiments were conducted on the popular Extended Cohn-Kanade dataset (CK+). It was chosen for its wide use in related literature, and because it includes videos of facial expressions and not only still images. The results show that the approach relying on machine learning via hand-crafted features is more suitable for classifying the initial phases of the expression and does not decay in terms of accuracy when images are at a distance (only 0.08% of decay). On the contrary, deep learning not only has difficulties classifying the initial phases of the expressions but also suffers from relevant performance decay when considering images at a distance (52.68% accuracy decay). & COPY; 2023 Elsevier B.V. All rights reserved.

引用

页数：15

共 46 条

[1]

[Anonymous], 2005, P 2 ANN C INF SEC CU

[2]

Arunnehru J, 2017, STUD COMPUT INTELL, V660, P321, DOI 10.1007/978-3-319-44790-2_15

[3] Investigating LSTM for Micro-Expression Recognition [J].

Bai, Mengjiong ;

Goecke, Roland .

COMPANION PUBLICATON OF THE 2020 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI '20 COMPANION), 2020, :7-11

[4]

Bashir Faisal, 2008, 2008 IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance, P175, DOI 10.1109/AVSS.2008.28

[5] Eagle-Eyes™:: A system for iris recognition at a distance [J].

Bashir, Faisal ;

Casaverde, Pablo ;

Usher, David ;

Friedman, Marc .

2008 IEEE CONFERENCE ON TECHNOLOGIES FOR HOMELAND SECURITY, VOLS 1 AND 2, 2008, :426-431

[6]

Bouchrika Imed, 2018, Action, P3, DOI [10.1007/978-3-319-68533-5_1, DOI 10.1007/978-3-319-68533-5_1]

[7] VGGFace2: A dataset for recognising faces across pose and age [J].

Cao, Qiong ;

Shen, Li ;

Xie, Weidi ;

Parkhi, Omkar M. ;

Zisserman, Andrew .

PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :67-74

[8] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

[9]

Cheng BW, 2017, INT CONF AFFECT, P65

[10] ES-RU: an entropy based rule to select representative templates in face surveillance [J].

De Marsico, Maria ;

Nappi, Michele ;

Riccio, Daniel .

MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 73 (01) :109-128

← 1 2 3 4 5 →