Sparse landmarks for facial action unit detection using vision transformer and perceiver

被引：0

作者：

Cakir, Duygu ^{[1
]}

Yilmaz, Gorkem ^{[2
]}

Arica, Nafiz ^{[3
]}

机构：

[1] Bahcesehir Univ, Fac Engn & Nat Sci, Dept Software Engn, Istanbul, Turkiye

[2] Bahcesehir Univ, Fac Engn & Nat Sci, Dept Comp Engn, Istanbul, Turkiye

[3] Piri Reis Univ, Fac Engn, Dept Informat Syst Engn, Istanbul, Turkiye

来源：

INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING | 2024年 / 27卷 / 05期

关键词：

action unit detection; sparse learning; vision transformer; perceiver; RECOGNITION; PATCHES;

D O I：

10.1504/IJCSE.2023.10060451

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The ability to accurately detect facial expressions, represented by facial action units (AUs), holds significant implications across diverse fields such as mental health diagnosis, security, and human-computer interaction. Although earlier approaches have made progress, the burgeoning complexity of facial actions demands more nuanced, computationally efficient techniques. This study pioneers the integration of sparse learning with vision transformer (ViT) and perceiver networks, focusing on the most active and descriptive landmarks for AU detection across both controlled (DISFA, BP4D) and in-the-wild (EmotioNet) datasets. Our novel approach, employing active landmark patches instead of the whole face, not only attains state-of-the-art performance but also uncovers insights into the differing attention mechanisms of ViT and perceiver. This fusion of techniques marks a significant advancement in facial analysis, potentially reshaping strategies in noise reduction and patch optimisation, setting a robust foundation for future research in the domain.

引用

页码：607 / 620

页数：15

共 50 条

[21] Efficient deepfake detection using shallow vision transformer
Shaheen Usmani
Sunil Kumar
Debanjan Sadhya
Multimedia Tools and Applications, 2024, 83 : 12339 - 12362
[22] Fire detection using vision transformer on power plant
Zhang, Kaidi
Wang, Binjun
Tong, Xin
Liu, Keke
ENERGY REPORTS, 2022, 8 : 657 - 664
[23] Multi-modality Empowered Network for Facial Action Unit Detection
Liu, Peng
Zhang, Zheng
Yang, Huiyuan
Yin, Lijun
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 2175 - 2184
[24] An Efficient Real-Time Emotion Detection Using Camera and Facial Landmarks
Nguyen, Binh T.
Trinh, Minh H.
Phan, Tan V.
Nguyen, Hien D.
2017 SEVENTH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST2017), 2017, : 251 - 255
[25] An Intrusion Detection System Using Vision Transformer for Representation Learning
Ban, Xinbo
Liu, Ao
He, Long
Gong, Li
FRONTIERS IN CYBER SECURITY, FCS 2023, 2024, 1992 : 531 - 544
[26] Improved Deepfake Video Detection Using Convolutional Vision Transformer
Deressa, Deressa Wodajo
Lambert, Peter
Van Wallendael, Glenn
Atnafu, Solomon
Mareen, Hannes
2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
[27] Explainable Anomaly Detection Using Vision Transformer Based SVDD
Baek, Ji-Won
Chung, Kyungyong
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 6573 - 6586
[28] Action unit detection in 3D facial videos with application in facial expression retrieval and recognition
Danelakis, Antonios
Theoharis, Theoharis
Pratikakis, Ioannis
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (19) : 24813 - 24841
[29] Action unit detection in 3D facial videos with application in facial expression retrieval and recognition
Antonios Danelakis
Theoharis Theoharis
Ioannis Pratikakis
Multimedia Tools and Applications, 2018, 77 : 24813 - 24841
[30] An efficient approach for facial action unit intensity detection using distance metric learning based on cosine similarity
Rathee, Neeru
Ganotra, Dinesh
SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (06) : 1141 - 1148

← 1 2 3 4 5 →