Lightweight ViT Model for Micro-Expression Recognition Enhanced by Transfer Learning

被引:10
作者
Liu, Yanju [1 ]
Li, Yange [2 ]
Yi, Xinhai [2 ]
Hu, Zuojin [1 ]
Zhang, Huiyu [2 ]
Liu, Yanzhong [2 ]
机构
[1] Nanjing Normal Univ Special Educ, Sch Math & Informat Sci, Nanjing, Peoples R China
[2] Qiqihar Univ, Sch Comp & Control Engn, Qiqihar, Peoples R China
来源
FRONTIERS IN NEUROROBOTICS | 2022年 / 16卷
关键词
computer vision; deep learning; convolutional neural network; vision transformer; micro-expression recognition;
D O I
10.3389/fnbot.2022.922761
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As opposed to macro-expressions, micro-expressions are subtle and not easily detectable emotional expressions, often containing rich information about mental activities. The practical recognition of micro-expressions is essential in interrogation and healthcare. Neural networks are currently one of the most common approaches to micro-expression recognition. Still, neural networks often increase their complexity when improving accuracy, and overly large neural networks require extremely high hardware requirements for running equipment. In recent years, vision transformers based on self-attentive mechanisms have achieved accuracy in image recognition and classification that is no less than that of neural networks. Still, the drawback is that without the image-specific biases inherent to neural networks, the cost of improving accuracy is an exponential increase in the number of parameters. This approach describes training a facial expression feature extractor by transfer learning and then fine-tuning and optimizing the MobileViT model to perform the micro-expression recognition task. First, the CASME II, SAMM, and SMIC datasets are combined into a compound dataset, and macro-expression samples are extracted from the three macro-expression datasets. Each macro-expression sample and micro-expression sample are pre-processed identically to make them similar. Second, the macro-expression samples were used to train the MobileNetV2 block in MobileViT as a facial expression feature extractor and to save the weights when the accuracy was highest. Finally, some of the hyperparameters of the MobileViT model are determined by grid search and then fed into the micro-expression samples for training. The samples are classified using an SVM classifier. In the experiments, the proposed method obtained an accuracy of 84.27%, and the time to process individual samples was only 35.4 ms. Comparative experiments show that the proposed method is comparable to state-of-the-art methods in terms of accuracy while improving recognition efficiency.
引用
收藏
页数:15
相关论文
共 46 条
  • [21] A Main Directional Mean Optical Flow Feature for Spontaneous Micro-Expression Recognition
    Liu, Yong-Jin
    Zhang, Jin-Kai
    Yan, Wen-Jing
    Wang, Su-Jing
    Zhao, Guoying
    Fu, Xiaolan
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (04) : 299 - 310
  • [22] Lucey P, 2010, IEEE COMP SOC C COMP, P94, DOI 10.1109/CVPRW.2010.5543262
  • [23] Coding facial expressions with Gabor wavelets
    Lyons, M
    Akamatsu, S
    Kamachi, M
    Gyoba, J
    [J]. AUTOMATIC FACE AND GESTURE RECOGNITION - THIRD IEEE INTERNATIONAL CONFERENCE PROCEEDINGS, 1998, : 200 - 205
  • [24] ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
    Ma, Ningning
    Zhang, Xiangyu
    Zheng, Hai-Tao
    Sun, Jian
    [J]. COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 : 122 - 138
  • [25] Mehta S., 2021, ARXIV PREPRINT ARXIV, DOI [10.48550/arXiv.2110.02178, DOI 10.48550/ARXIV.2110.02178]
  • [26] Dual Temporal Scale Convolutional Neural Network for Micro-Expression Recognition
    Peng, Min
    Wang, Chongyang
    Chen, Tong
    Liu, Guangyuan
    Fu, Xiaolan
    [J]. FRONTIERS IN PSYCHOLOGY, 2017, 8
  • [27] Pfister T, 2011, IEEE I CONF COMP VIS, P1449, DOI 10.1109/ICCV.2011.6126401
  • [28] Ruiz-Rivas Joaquin, 2013, 2013 Conference on Lasers & Electro-Optics. Europe & International Quantum Electronics Conference (CLEO EUROPE/IQEC), DOI 10.1109/CLEOE-IQEC.2013.6801849
  • [29] ImageNet Large Scale Visual Recognition Challenge
    Russakovsky, Olga
    Deng, Jia
    Su, Hao
    Krause, Jonathan
    Satheesh, Sanjeev
    Ma, Sean
    Huang, Zhiheng
    Karpathy, Andrej
    Khosla, Aditya
    Bernstein, Michael
    Berg, Alexander C.
    Fei-Fei, Li
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) : 211 - 252
  • [30] MobileNetV2: Inverted Residuals and Linear Bottlenecks
    Sandler, Mark
    Howard, Andrew
    Zhu, Menglong
    Zhmoginov, Andrey
    Chen, Liang-Chieh
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4510 - 4520