DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video

被引：0

作者：

Xiang, Xuezhi ^{[1
,2
]}

Li, Xiaoheng ^{[1
]}

Bao, Weijie ^{[1
]}

Qiaoa, Yulong ^{[1
,3
]}

El Saddik, Abdulmotaleb ^{[3
]}

机构：

[1] Harbin Engn Univ, Sch Informat & Commun Engn, Harbin 150001, Peoples R China

[2] Minist Ind & Informat Technol, Key Lab Adv Marine Commun & Informat Technol, Harbin 150001, Peoples R China

[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON K1N 6N5, Canada

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2024年 / 249卷

基金：

中国国家自然科学基金; 黑龙江省自然科学基金;

关键词：

3D human pose estimation; Transformer; Dual-branch; Cross-hypothesis;

D O I：

10.1016/j.cviu.2024.104147

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The estimation of 3D human poses from monocular videos presents a significant challenge. The existing methods face the problems of deep ambiguity and self-occlusion. To overcome these problems, we propose a Double-Branch Multi-Hypothesis Transformer (DBMHT). In detail, we utilize a Double-Branch architecture to capture temporal and spatial information and generate multiple hypotheses. To merge these hypotheses, we adopt a lightweight module to integrate spatial and temporal representations. The DBMHT can not only capture spatial information from each joint in the human body and temporal information from each frame in the video but also merge multiple hypotheses that have different spatio-temporal information. Comprehensive evaluation on two challenging datasets (i.e. Human3.6M and MPI-INF-3DHP) demonstrates the superior performance of DBMHT, marking it as a robust and efficient approach for accurate 3D HPE in dynamic scenarios. The results show that our model surpasses the state-of-the-art approach by 1.9% MPJPE with ground truth 2D keypoints as input.

引用

页数：8

共 49 条

[21] HOGFormer: high-order graph convolution transformer for 3D human pose estimation
Xie, Yuhong
Hong, Chaoqun
Zhuang, Weiwei
Liu, Lijuan
Li, Jie
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (01) : 599 - 610
[22] ESMformer: Error-aware self-supervised transformer for multi-view 3D human pose estimation
Zhang, Lijun
Zhou, Kangkang
Lu, Feng
Li, Zhenghao
Shao, Xiaohu
Zhou, Xiang-Dong
Shi, Yu
PATTERN RECOGNITION, 2025, 158
[23] Joint multi-scale transformers and pose equivalence constraints for 3D human pose estimation
Wu, Yongpeng
Kong, Dehui
Gao, Junna
Li, Jinghua
Yin, Baocai
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103
[24] Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in video
Xu, Feiyi
Wang, Jifan
Sun, Ying
Qi, Jin
Dong, Zhenjiang
Sun, Yanfei
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
[25] HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
Cheng, Wencan
Kim, Eunji
Ko, Jong Hwan
COMPUTER VISION - ECCV 2024, PT LXXXVIII, 2025, 15146 : 35 - 52
[26] MixPose: 3D Human Pose Estimation with Mixed Encoder
Cheng, Jisheng
Cheng, Qin
Yang, Mengjie
Liu, Zhen
Zhang, Qieshi
Cheng, Jun
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 353 - 364
[27] Group Spatial Attention for 3D Human Pose Estimation
Tran, Tien-Dat
Cao, Ge
Ashraf, Russo
Jo, Kang-Hyun
2024 33RD INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, ISIE 2024, 2024,
[28] Multi-scale Feature Injection for Occluded 3D Human Pose and Shape Estimation
Shi, Yunhui
Ge, Yangyang
Wang, Jin
2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 4881 - 4886
[29] HYRE: Hybrid Regressor for 3D Human Pose and Shape Estimation
Li, Wenhao
Liu, Mengyuan
Liu, Hong
Ren, Bin
Li, Xia
You, Yingxuan
Sebe, Nicu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 235 - 246
[30] TSwinPose: Enhanced monocular 3D human pose estimation with JointFlow
Li, Muyu
Hu, Henan
Xiong, Jingjing
Zhao, Xudong
Yan, Hong
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249

← 1 2 3 4 5 →