Head Pose Estimation Based on Multi-Level Feature Fusion

被引：1

作者：

Yan, Chunman ^{[1
,2
]}

Zhang, Xiao ^{[1
]}

机构：

[1] Northwest Normal Univ, Sch Phys & Elect, Lanzhou 730070, Peoples R China

[2] Engn Res Ctr Gansu Prov Intelligent Informat Techn, Lanzhou 730070, Peoples R China

来源：

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE | 2024年 / 38卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Head pose estimation; RepVGG-A2; multi-level feature fusion; attention mechanism; loss function;

D O I：

10.1142/S0218001424560020

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Head Pose Estimation (HPE) has a wide range of applications in computer vision, but still faces challenges: (1) Existing studies commonly use Euler angles or quaternions as pose labels, which may lead to discontinuity problems. (2) HPE does not effectively address regression via rotated matrices. (3) There is a low recognition rate in complex scenes, high computational requirements, etc. This paper presents an improved unconstrained HPE model to address these challenges. First, a rotation matrix form is introduced to solve the problem of unclear rotation labels. Second, a continuous 6D rotation matrix representation is used for efficient and robust direct regression. The RepVGG-A2 lightweight framework is used for feature extraction, and by adding a multi-level feature fusion module and a coordinate attention mechanism with residual connection, to improve the network's ability to perceive contextual information and pay attention to features. The model's accuracy was further improved by replacing the network activation function and improving the loss function. Experiments on the BIWI dataset 7:3 dividing the training and test sets show that the average absolute error of HPE for the proposed network model is 2.41. Trained on the dataset 300W_LP and tested on the AFLW2000 and BIWI datasets, the average absolute errors of HPE of the proposed network model are 4.34 and 3.93. The experimental results demonstrate that the improved network has better HPE performance.

引用

页数：23

共 30 条

[1]

[Anonymous], 2019, INT J PATTERN RECOGN, V33

[2] How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks) [J].

Bulat, Adrian ;

Tzimiropoulos, Georgios .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1021-1030

[3] A Vector-based Representation to Enhance Head Pose Estimation [J].

Cao, Zhiwen ;

Chu, Zongcheng ;

Liu, Dongfang ;

Chen, Yingjie .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, :1187-1196

[4]

Chuan T., 2019, Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference, P123, DOI DOI 10.1145/3341069.3342979

[5]

Dapogny K., IEEE INT C AUT FAC G, P192

[6]

Gupta A, 2019, INT CONF ACOUST SPEE, P1977, DOI [10.1109/ICASSP.2019.8683503, 10.1109/icassp.2019.8683503]

[7] 6D ROTATION REPRESENTATION FOR UNCONSTRAINED HEAD POSE ESTIMATION [J].

Hempel, Thorsten ;

Abdelrahman, Ahmed A. ;

Al-Hamadi, Ayoub .

2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, :2496-2500

[8]

Hendrycks K., 2016, ARXIV

[9] Coordinate Attention for Efficient Mobile Network Design [J].

Hou, Qibin ;

Zhou, Daquan ;

Feng, Jiashi .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13708-13717

[10] QuatNet: Quaternion-Based Head Pose Estimation With Multiregression Loss [J].

Hsu, Heng-Wei ;

Wu, Tung-Yu ;

Wan, Sheng ;

Wong, Wing Hung ;

Lee, Chen-Yi .

IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) :1035-1046

← 1 2 3 →