Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer

被引:76
作者
Liu, Hai [1 ]
Zhang, Cheng [1 ]
Deng, Yongjian [2 ]
Liu, Tingting [3 ,4 ]
Zhang, Zhaoli [1 ]
Li, You-Fu [4 ]
机构
[1] Cent China Normal Univ, Natl Engn Res Ctr Elearning, Wuhan 430079, Peoples R China
[2] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[3] Hubei Univ, Sch Educ, Wuhan 430062, Hubei, Peoples R China
[4] City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China
关键词
Head; Transformers; Visualization; Computer architecture; Pose estimation; Task analysis; Semantics; Head pose estimation; attention mechanism; relationship perception; deep learning; transformer;
D O I
10.1109/TIP.2023.3331309
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Head pose estimation (HPE) is an indispensable upstream task in the fields of human-machine interaction, self-driving, and attention detection. However, practical head pose applications suffer from several challenges, such as severe occlusion, low illumination, and extreme orientations. To address these challenges, we identify three cues from head images, namely, critical minority relationships, neighborhood orientation relationships, and significant facial changes. On the basis of the three cues, two key insights on head poses are revealed: 1) intra-orientation relationship and 2) cross-orientation relationship. To leverage two key insights above, a novel relationship-driven method is proposed based on the Transformer architecture, in which facial and orientation relationships can be learned. Specifically, we design several orientation tokens to explicitly encode basic orientation regions. Besides, a novel token guide multi-loss function is accordingly designed to guide the orientation tokens as they learn the desired regional similarities and relationships. Experimental results on three challenging benchmark HPE datasets show that our proposed TokenHPE achieves state-of-the-art performance. Moreover, qualitative visualizations are provided to verify the effectiveness of the token-learning methodology.
引用
收藏
页码:6289 / 6302
页数:14
相关论文
共 55 条
  • [1] Head pose estimation: An extensive survey on recent techniques and applications
    Abate, Andrea F.
    Bisogni, Carmen
    Castiglione, Aniello
    Nappi, Michele
    [J]. PATTERN RECOGNITION, 2022, 127
  • [2] Head pose estimation by regression algorithm
    Abate, Andrea F.
    Barra, Paola
    Pero, Chiara
    Tucci, Maurizio
    [J]. PATTERN RECOGNITION LETTERS, 2020, 140 : 179 - 185
  • [3] img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation
    Albiero, Vitor
    Chen, Xingyu
    Yin, Xi
    Pang, Guan
    Hassner, Tal
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7613 - 7623
  • [4] Web-Shaped Model for Head Pose Estimation: An Approach for Best Exemplar Selection
    Barra, Paola
    Barra, Silvio
    Bisogni, Carmen
    De Marsico, Maria
    Nappi, Michele
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5457 - 5468
  • [5] Bicho D, 2019, IEEE SYS MAN CYBERN, P2645, DOI 10.1109/SMC.2019.8914350
  • [6] FASHE: A FrActal Based Strategy for Head Pose Estimation
    Bisogni, Carmen
    Nappi, Michele
    Pero, Chiara
    Ricciardi, Stefano
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3192 - 3203
  • [7] How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)
    Bulat, Adrian
    Tzimiropoulos, Georgios
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1021 - 1030
  • [8] A Vector-based Representation to Enhance Head Pose Estimation
    Cao, Zhiwen
    Chu, Zongcheng
    Liu, Dongfang
    Chen, Yingjie
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1187 - 1196
  • [9] Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency
    Chong, Eunji
    Ruiz, Nataniel
    Wang, Yongxin
    Zhang, Yun
    Rozga, Agata
    Rehg, James M.
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 397 - 412
  • [10] Cordonnier JB, 2020, Arxiv, DOI arXiv:1911.03584