A Survey on Interpretability of Facial Expression Recognition

被引：0

作者：

Zhang, Miao-Xuan ^{[1
]}

Zhang, Hong-Gang ^{[1
]}

机构：

[1] School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing

来源：

Jisuanji Xuebao/Chinese Journal of Computers | 2024年 / 47卷 / 12期

基金：

中国国家自然科学基金;

关键词：

affective computing; computer vision; facial expression recognition; interpretability; machine learning;

D O I：

10.11897/SP.J.1016.2024.02819

中图分类号：

学科分类号：

摘要：

In recent years， Facial Expression Recognition （FER） has been widely used in medicine， social robotics， communication， security and many other fields. A growing number of researchers are showing interest in the FER area and have proposed useful algorithms. At the same time， the study of FER interpretability has attracted increasing attention from researchers，as it can deepen their understanding of the models and ensure fairness， privacy preservation， and robustness. In this paper， we summarized the interpretability works in the field of FER based on the classification of result interpretability， mechanism interpretability， and model interpretability. Result interpretability indicates the extent to which people with specific experience can consistently understand the results of the models. Specifically， result interpretable FER mainly includes methods based on text description and the basic structure of the face. Wherein the methods based on face structure consists of approaches based on facial action units （AU），topological modeling， caricature images and interference analysis. In addition， mechanism interpretability focuses on explanation of the internal mechanism of the models， including the attention mechanism in FER， as well as the interpretability methods based on feature decoupling and concept learning. As for model interpretability， researchers often try to find out the decision principle or rules of the models. This paper illustrates the interpretable classification methods in FER， which belong to model interpretability. Such approaches involve those based on Multi-Kernel Support Vector Machine（MKSVM）and those based on decision trees and deep forest. Additionally， we compared and analyzed the FER interpretability works. We also identified current problems in this area， including the lack of evaluation metrics for FER interpretability analysis， the challenge of balancing the accuracy and interpretability of FER models， and the limited interpretability data available for expression recognition. Afterwards， a discussion and outlook on the way forward took place. First is about the interpretability of complex expressions recognition， mainly focusing on the compound expressions and more delicate fine-grained expressions. Then it comes to the interpretability of multi-modal emotion recognition. Multimodal models can obtain better performance by complementing the information of each modality，and their interpretability analysis is also an important direction worth exploring in the future. Additionally， we believe that interpretability of expression and emotion recognition with large models is another significant future direction， including interpretability of Large Vision Models，Vision Language Models and Multi-modal Large Models. Interpretability study can help to improve the safety and reliability of large models. Finally， we address the enhancement of generalization ability based on interpretability. When the models are learning“relevance”rather than“causality”， they are easy to make wrong judgments when encountering new data or being affected by other factors， that is， the models do not have good generalization performance. The interpretability analysis helps deepen our understanding of the nature of the models， explain the causal relationship between input and output， and therefore improve the generalization performance. This paper intends to provide interested researchers with a comprehensive review and analysis of the current state of research on the interpretability of facial expression recognition，thereby promoting further advancements in this field. © 2024 Science Press. All rights reserved.

引用

页码：2819 / 2851

页数：32

共 161 条

[1]

Li-Zhuang Zhao, Wen Gao, Xi-Lin Chen, Eigenface dimension variant classification and it’s application in expression recognition, Chinese Journal of Computers, 22, 6, pp. 627-632, (1999)

[2]

Jung H，, Lee S，, Yim J，, Et al., Joint fine-tuning in deep neural networks for facial expression recognition, Proceedings of the 2015 IEEE International Conference on Computer Vision （ICCV）, pp. 2983-2991, (2015)

[3]

Holonet：Towards robust emotion recognition in the wild, Proceedings of the 18th ACM International Conference on Multimodal Interaction （ICMI）, pp. 472-478, (2016)

[4]

Hou B，, Zhou Z., Learning with interpretable structure from gated rnn, IEEE Transactions on Neural Networks and Learning Systems, 31, 7, pp. 2267-2279, (2020)

[5]

Kim B，, Khanna R，, Koyejo O., Examples are not enough，learn to criticize！ Criticism for interpretability, Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 2288-2296, (2016)

[6]

Adler P，, Falk C，, Friedler S A，, Et al., Auditing black-box models for indirect influence, Knowledge and Information Systems, 54, 1, pp. 95-122, (2018)

[7]

Karpathy A，, Li F-F., Deep visual-semantic alignments for generating image descriptions, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 4, pp. 664-676, (2015)

[8]

Lipton Z C., The mythos of model interpretability, Communications of the ACM, 61, 10, pp. 36-43, (2018)

[9]

Zhao B，, Wu X，, Feng J，, Et al., Diversified visual attention networks for fine-grained object classification, IEEE Transactions on Multimedia, 19, 6, pp. 1245-1256, (2017)

[10]

Yi-Fu Zeng, Tian Lan, Zu-Feng Wu, Et al., Bi-Memory based attention model for aspect level sentiment classification, Chinese Journal of Computers, 42, 8, pp. 1845-1857, (2019)

← 1 2 3 4 5 6 7 8 9 10 →