Emotion recognition at a distance: The robustness of machine learning based on hand-crafted facial features vs deep learning models

被引:9
作者
Bisogni, Carmen [1 ]
Cimmino, Lucia [1 ]
De Marsico, Maria [2 ]
Hao, Fei [3 ]
Narducci, Fabio [1 ]
机构
[1] Univ Salerno, Dept Comp Sci, I-84084 Salerno, Italy
[2] Univ Roma La Sapienza, Dept Comp Sci, I-00185 Rome, Italy
[3] Shaanxi Normal Univ, Sch Comp Sci, Xian 710119, Peoples R China
关键词
Emotion recognition; Facial expression; Expression recognition at a distance; Deep learning; Machine learning; Mediapipe; EXPRESSION;
D O I
10.1016/j.imavis.2023.104724
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion estimation from face expression analysis is nowadays a widely-explored computer vision task. In turn, the classification of expressions relies on relevant facial features and their dynamics. Despite the promising accuracy results achieved in controlled and favorable conditions, the processing of faces acquired at a distance, entailing low-quality images, still suffers from a significant performance decrease. In particular, most approaches and related computational models become extremely unstable in the case of the very small amount of useful pixels that is typical in these conditions. Therefore, their behavior should be investigated more carefully. On the other hand, real-time emotion recognition at a distance may play a critical role in smart video surveillance, especially when controlling particular kinds of events, e.g., political meetings, to try to prevent adverse actions. This work compares facial expression recognition at a distance by: 1) a deep learning architecture based on stateof-the-art (SOTA) proposals, which exploits the whole images to autonomously learn the relevant embeddings; 2) a machine learning approach that relies on hand-crafted features, namely the facial landmarks preliminarily extracted using the popular Mediapipe framework. Instead of using either the complete sequence of frames or only the final still image of the expression, like current SOTA approaches, the two proposed methods are designed to use rich temporal information to identify three different stages of emotion. Expressions are time-split accordingly into four phases to better exploit their temporal-dependent dynamics. Experiments were conducted on the popular Extended Cohn-Kanade dataset (CK+). It was chosen for its wide use in related literature, and because it includes videos of facial expressions and not only still images. The results show that the approach relying on machine learning via hand-crafted features is more suitable for classifying the initial phases of the expression and does not decay in terms of accuracy when images are at a distance (only 0.08% of decay). On the contrary, deep learning not only has difficulties classifying the initial phases of the expressions but also suffers from relevant performance decay when considering images at a distance (52.68% accuracy decay). & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 46 条
  • [1] [Anonymous], 2015, BMVC 2015 P BRIT MAC, DOI 10.5244/c.29.41
  • [2] Arunnehru J, 2017, STUD COMPUT INTELL, V660, P321, DOI 10.1007/978-3-319-44790-2_15
  • [3] Investigating LSTM for Micro-Expression Recognition
    Bai, Mengjiong
    Goecke, Roland
    [J]. COMPANION PUBLICATON OF THE 2020 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI '20 COMPANION), 2020, : 7 - 11
  • [4] Bashir Faisal, 2008, 2008 IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance, P175, DOI 10.1109/AVSS.2008.28
  • [5] Eagle-Eyes™:: A system for iris recognition at a distance
    Bashir, Faisal
    Casaverde, Pablo
    Usher, David
    Friedman, Marc
    [J]. 2008 IEEE CONFERENCE ON TECHNOLOGIES FOR HOMELAND SECURITY, VOLS 1 AND 2, 2008, : 426 - 431
  • [6] Bouchrika I., 2018, Surveillance in Action: Technologies for Civilian, Military and Cyber Surveillance, P3, DOI [10.1007/978-3-319-68533-5_1, DOI 10.1007/978-3-319-68533-5_1, DOI 10.1007/978-3-319-68533-51]
  • [7] Bullington J., 2005, P 2 ANN C INF SEC CU, P95
  • [8] VGGFace2: A dataset for recognising faces across pose and age
    Cao, Qiong
    Shen, Li
    Xie, Weidi
    Parkhi, Omkar M.
    Zisserman, Andrew
    [J]. PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 67 - 74
  • [9] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [10] Cheng BW, 2017, INT CONF AFFECT, P65