Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer

被引：76

作者：

Liu, Hai ^{[1
]}

Zhang, Cheng ^{[1
]}

Deng, Yongjian ^{[2
]}

Liu, Tingting ^{[3
,4
]}

Zhang, Zhaoli ^{[1
]}

Li, You-Fu ^{[4
]}

机构：

[1] Cent China Normal Univ, Natl Engn Res Ctr Elearning, Wuhan 430079, Peoples R China

[2] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China

[3] Hubei Univ, Sch Educ, Wuhan 430062, Hubei, Peoples R China

[4] City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

关键词：

Head; Transformers; Visualization; Computer architecture; Pose estimation; Task analysis; Semantics; Head pose estimation; attention mechanism; relationship perception; deep learning; transformer;

D O I：

10.1109/TIP.2023.3331309

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Head pose estimation (HPE) is an indispensable upstream task in the fields of human-machine interaction, self-driving, and attention detection. However, practical head pose applications suffer from several challenges, such as severe occlusion, low illumination, and extreme orientations. To address these challenges, we identify three cues from head images, namely, critical minority relationships, neighborhood orientation relationships, and significant facial changes. On the basis of the three cues, two key insights on head poses are revealed: 1) intra-orientation relationship and 2) cross-orientation relationship. To leverage two key insights above, a novel relationship-driven method is proposed based on the Transformer architecture, in which facial and orientation relationships can be learned. Specifically, we design several orientation tokens to explicitly encode basic orientation regions. Besides, a novel token guide multi-loss function is accordingly designed to guide the orientation tokens as they learn the desired regional similarities and relationships. Experimental results on three challenging benchmark HPE datasets show that our proposed TokenHPE achieves state-of-the-art performance. Moreover, qualitative visualizations are provided to verify the effectiveness of the token-learning methodology.

引用

页码：6289 / 6302

页数：14

共 55 条

[1] Head pose estimation: An extensive survey on recent techniques and applications
Abate, Andrea F.
Bisogni, Carmen
Castiglione, Aniello
Nappi, Michele
[J]. PATTERN RECOGNITION, 2022, 127
[2] Head pose estimation by regression algorithm
Abate, Andrea F.
Barra, Paola
Pero, Chiara
Tucci, Maurizio
[J]. PATTERN RECOGNITION LETTERS, 2020, 140 : 179 - 185
[3] img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation
Albiero, Vitor
Chen, Xingyu
Yin, Xi
Pang, Guan
Hassner, Tal
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7613 - 7623
[4] Web-Shaped Model for Head Pose Estimation: An Approach for Best Exemplar Selection
Barra, Paola
Barra, Silvio
Bisogni, Carmen
De Marsico, Maria
Nappi, Michele
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5457 - 5468
[5] Bicho D, 2019, IEEE SYS MAN CYBERN, P2645, DOI 10.1109/SMC.2019.8914350
[6] FASHE: A FrActal Based Strategy for Head Pose Estimation
Bisogni, Carmen
Nappi, Michele
Pero, Chiara
Ricciardi, Stefano
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3192 - 3203
[7] How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)
Bulat, Adrian
Tzimiropoulos, Georgios
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1021 - 1030
[8] A Vector-based Representation to Enhance Head Pose Estimation
Cao, Zhiwen
Chu, Zongcheng
Liu, Dongfang
Chen, Yingjie
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1187 - 1196
[9] Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
Chong, Eunji
Ruiz, Nataniel
Wang, Yongxin
Zhang, Yun
Rozga, Agata
Rehg, James M.
[J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 397 - 412
[10] Cordonnier JB, 2020, Arxiv, DOI arXiv:1911.03584

← 1 2 3 4 5 6 →