DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video

被引：0

作者：

Xiang, Xuezhi ^{[1
,2
]}

Li, Xiaoheng ^{[1
]}

Bao, Weijie ^{[1
]}

Qiaoa, Yulong ^{[1
,3
]}

El Saddik, Abdulmotaleb ^{[3
]}

机构：

[1] Harbin Engn Univ, Sch Informat & Commun Engn, Harbin 150001, Peoples R China

[2] Minist Ind & Informat Technol, Key Lab Adv Marine Commun & Informat Technol, Harbin 150001, Peoples R China

[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON K1N 6N5, Canada

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2024年 / 249卷

基金：

中国国家自然科学基金; 黑龙江省自然科学基金;

关键词：

3D human pose estimation; Transformer; Dual-branch; Cross-hypothesis;

D O I：

10.1016/j.cviu.2024.104147

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The estimation of 3D human poses from monocular videos presents a significant challenge. The existing methods face the problems of deep ambiguity and self-occlusion. To overcome these problems, we propose a Double-Branch Multi-Hypothesis Transformer (DBMHT). In detail, we utilize a Double-Branch architecture to capture temporal and spatial information and generate multiple hypotheses. To merge these hypotheses, we adopt a lightweight module to integrate spatial and temporal representations. The DBMHT can not only capture spatial information from each joint in the human body and temporal information from each frame in the video but also merge multiple hypotheses that have different spatio-temporal information. Comprehensive evaluation on two challenging datasets (i.e. Human3.6M and MPI-INF-3DHP) demonstrates the superior performance of DBMHT, marking it as a robust and efficient approach for accurate 3D HPE in dynamic scenarios. The results show that our model surpasses the state-of-the-art approach by 1.9% MPJPE with ground truth 2D keypoints as input.

引用

页数：8

共 49 条

[31] LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION
Chen, Ziyi
Sugimoto, Akihiro
Lai, Shang-Hong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4218 - 4222
[32] A Study on 3D Human Pose Estimation Using Through-Wall IR-UWB Radar and Transformer
Kim, Gon Woo
Lee, Sang Won
Son, Ha Young
Choi, Kae Won
IEEE ACCESS, 2023, 11 : 15082 - 15095
[33] Dual-Branch Network with Online Knowledge Distillation for 3D Hand Pose Estimation
He, Yingqi
Li, Jinghua
Kong, Dehui
Yin, Baocai
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT III, 2024, 15018 : 130 - 143
[34] Occlusion Robust 3D Human Pose Estimation with StridedPoseGraphFormer and Data Augmentation
Banik, Soubarna
Gschossmann, Patricia
Garcia, Alejandro Mendoza
Knoll, Alois
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[35] SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
Xu, Xiangyu
Liu, Lijuan
Yan, Shuicheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3275 - 3289
[36] 3D hand pose and mesh estimation via a generic Topology-aware Transformer model
Yu, Shaoqi
Wang, Yintong
Chen, Lili
Zhang, Xiaolin
Li, Jiamao
FRONTIERS IN NEUROROBOTICS, 2024, 18
[37] SCALE-Pose: Skeletal Correction and Language Knowledge-assisted for 3D Human Pose Estimation
Ma, Xinnan
Li, Yaochen
Zhao, Limeng
Zhou, ChenXu
Xu, Yuncheng
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI, 2025, 15041 : 578 - 592
[38] HDPose: Post-Hierarchical Diffusion with Conditioning for 3D Human Pose Estimation
Lee, Donghoon
Kim, Jaeho
SENSORS, 2024, 24 (03)
[39] A Novel Auxiliary Task Framework in 3D Human Pose Estimation for Opera Videos
Cai, Xingquan
Zhang, Haoyu
He, Shanshan
Song, Haoyu
Sun, Haiyan
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 202 - 210
[40] Global and Local Spatio-Temporal Encoder for 3D Human Pose Estimation
Wang, Yong
Kang, Hongbo
Wu, Doudou
Yang, Wenming
Zhang, Longbin
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4039 - 4049

← 1 2 3 4 5 →