Fine-Tuning Swin Transformer and Multiple Weights Optimality-Seeking for Facial Expression Recognition

被引:13
作者
Feng, Hongqi [1 ]
Huang, Weikai [1 ]
Zhang, Denghui [2 ]
Zhang, Bangze [3 ]
机构
[1] Changzhou Univ, Sch Comp Sci & Artificial Intelligence, Changzhou 213100, Peoples R China
[2] Zhejiang Shuren Univ, Coll Informat Technol, Hangzhou 310000, Peoples R China
[3] Zhejiang Univ Technol, Sch Comp Sci & Technol, Hangzhou 310023, Peoples R China
关键词
Transformers; Face recognition; Merging; Data models; Feature extraction; Training data; Task analysis; Facial expression recognition; greedy strategy; multiple weights optimality-seeking; swin transformer;
D O I
10.1109/ACCESS.2023.3237817
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Facial expression recognition plays a key role in human-computer emotional interaction. However, human faces in real environments are affected by various unfavorable factors, which will result in the reduction of expression recognition accuracy. In this paper, we proposed a novel method which combines Fine-tuning Swin Transformer and Multiple Weights Optimality-seeking (FST-MWOS) to enhanced expression recognition performance. FST-MWOS mainly consists of two crucial components: Fine-tuning Swin Transformer (FST) and Multiple Weights Optimality-seeking (MWOS). FST takes Swin Transformer Large as the backbone network to obtain multiple groups of fine-tuned model weights for the homologous data domains by hyperparameters configurations, data augmentation methods, etc. In MWOS a greedy strategy was used to mine locally optimal generalizations in the optimal epoch interval of each group of fine-tuned model weights. Then, the optimality-seeking for multiple groups of locally optimal weights was utilized to obtain the global optimal solution. Experiments results on RAF-DB, FERPlus and AffectNet datasets show that the proposed FST-MWOS method outperforms various state-of-the-art methods.
引用
收藏
页码:9995 / 10003
页数:9
相关论文
共 39 条
[1]   Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution [J].
Barsoum, Emad ;
Zhang, Cha ;
Ferrer, Cristian Canton ;
Zhang, Zhengyou .
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :279-283
[2]  
Cha J, 2021, ADV NEUR IN
[3]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[4]  
Darwin C., 1872, P374
[5]  
Dosovitskiy Alexey, 2021, INT C LEARN REPR
[6]   Facial Expression Recognition in the Wild via Deep Attentive Center Loss [J].
Farzaneh, Amir Hossein ;
Qi, Xiaojun .
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :2401-2410
[7]  
Feng X., 2005, Pattern Recognition and Image Analysis, V15, P546
[8]   Masked Autoencoders Are Scalable Vision Learners [J].
He, Kaiming ;
Chen, Xinlei ;
Xie, Saining ;
Li, Yanghao ;
Dollar, Piotr ;
Girshick, Ross .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988
[9]   Rethinking Spatial Dimensions of Vision Transformers [J].
Heo, Byeongho ;
Yun, Sangdoo ;
Han, Dongyoon ;
Chun, Sanghyuk ;
Choe, Junsuk ;
Oh, Seong Joon .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :11916-11925
[10]  
Heredia J., 2022, 18 INT C INT ENV IE2, P46