Fine-Tuning Swin Transformer and Multiple Weights Optimality-Seeking for Facial Expression Recognition

被引：13

作者：

Feng, Hongqi ^{[1
]}

Huang, Weikai ^{[1
]}

Zhang, Denghui ^{[2
]}

Zhang, Bangze ^{[3
]}

机构：

[1] Changzhou Univ, Sch Comp Sci & Artificial Intelligence, Changzhou 213100, Peoples R China

[2] Zhejiang Shuren Univ, Coll Informat Technol, Hangzhou 310000, Peoples R China

[3] Zhejiang Univ Technol, Sch Comp Sci & Technol, Hangzhou 310023, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Transformers; Face recognition; Merging; Data models; Feature extraction; Training data; Task analysis; Facial expression recognition; greedy strategy; multiple weights optimality-seeking; swin transformer;

D O I：

10.1109/ACCESS.2023.3237817

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Facial expression recognition plays a key role in human-computer emotional interaction. However, human faces in real environments are affected by various unfavorable factors, which will result in the reduction of expression recognition accuracy. In this paper, we proposed a novel method which combines Fine-tuning Swin Transformer and Multiple Weights Optimality-seeking (FST-MWOS) to enhanced expression recognition performance. FST-MWOS mainly consists of two crucial components: Fine-tuning Swin Transformer (FST) and Multiple Weights Optimality-seeking (MWOS). FST takes Swin Transformer Large as the backbone network to obtain multiple groups of fine-tuned model weights for the homologous data domains by hyperparameters configurations, data augmentation methods, etc. In MWOS a greedy strategy was used to mine locally optimal generalizations in the optimal epoch interval of each group of fine-tuned model weights. Then, the optimality-seeking for multiple groups of locally optimal weights was utilized to obtain the global optimal solution. Experiments results on RAF-DB, FERPlus and AffectNet datasets show that the proposed FST-MWOS method outperforms various state-of-the-art methods.

引用

页码：9995 / 10003

页数：9

共 39 条

[1] Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution [J].

Barsoum, Emad ;

Zhang, Cha ;

Ferrer, Cristian Canton ;

Zhang, Zhengyou .

ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, :279-283

[2]

Cha J, 2021, ADV NEUR IN

[3] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].

Chen, Chun-Fu ;

Fan, Quanfu ;

Panda, Rameswar .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356

[4]

Darwin C., 1872, P374

[5]

Dosovitskiy Alexey, 2021, INT C LEARN REPR

[6] Facial Expression Recognition in the Wild via Deep Attentive Center Loss [J].

Farzaneh, Amir Hossein ;

Qi, Xiaojun .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :2401-2410

[7]

Feng X., 2005, Pattern Recognition and Image Analysis, V15, P546

[8] Masked Autoencoders Are Scalable Vision Learners [J].

He, Kaiming ;

Chen, Xinlei ;

Xie, Saining ;

Li, Yanghao ;

Dollar, Piotr ;

Girshick, Ross .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988

[9] Rethinking Spatial Dimensions of Vision Transformers [J].

Heo, Byeongho ;

Yun, Sangdoo ;

Han, Dongyoon ;

Chun, Sanghyuk ;

Choe, Junsuk ;

Oh, Seong Joon .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :11916-11925

[10]

Heredia J., 2022, 18 INT C INT ENV IE2, P46

← 1 2 3 4 →