BCDPose: Diffusion-based 3D Human Pose Estimation with bone-chain prior knowledge

被引：0

作者：

Liu, Xing ^{[1
]}

Tang, Hao ^{[2
]}

机构：

[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 201804, Peoples R China

[2] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2025年 / 162卷

基金：

中国国家自然科学基金;

关键词：

3D human pose estimation; Diffusion model; Bone-chain enhanced denoiser; Joint-DoF hierarchical temporal embedding; TRANSFORMER;

D O I：

10.1016/j.imavis.2025.105636

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, diffusion-based methods have emerged as the golden standard in 3D Human Pose Estimation task, largely thanks to their exceptional generative capabilities. In the past, researchers have made concerted efforts to develop spatial and temporal denoisers utilizing transformer blocks in diffusion-based methods. However, existing Transformer-based denoisers in diffusion models often overlook implicit structural and kinematic supervision derived from prior knowledge of human biomechanics, including prior knowledge of human bone-chain structure and joint kinematics, which could otherwise enhance performance. We hold the view that joint movements are intrinsically constrained by neighboring joints within the bone-chain and by kinematic hierarchies. Then, we propose a Bone-Chain enhanced Diffusion 3D pose estimation method, or BCDPose. In this method, we introduce a novel Bone-Chain prior knowledge enhanced transformer blocks within the denoiser to reconstruct contaminated 3D pose data. Additionally, we propose the Joint-DoF Hierarchical Temporal Embedding framework, which incorporates prior knowledge of joint kinematics. By integrating body hierarchy and temporal dependencies, this framework effectively captures the complexity of human motion, thereby enabling accurate and robust pose estimation. This innovation proposes a comprehensive framework for 3D human pose estimation by explicitly modeling joint kinematics, thereby overcoming the limitations of prior methods that fail to capture the intrinsic dynamics of human motion. We conduct extensive experiments on various open benchmarks to evaluate the effectiveness of BCDPose. The results convincingly demonstrate that BCDPose achieves highly competitive results compared with other state-of-the-art models. This underscores its promising potential and practical applicability in 2D-3D human pose estimation tasks. We plan to release our code publicly for further research.

引用

页数：10

共 61 条

[1]

Austin J, 2021, ADV NEUR IN

[2] Real-time 3D human pose estimation without skeletal a priori structures [J].

Bai, Guihu ;

Luo, Yanmin ;

Pan, Xueliang ;

Wang, Jia ;

Guo, Jing-Ming .

IMAGE AND VISION COMPUTING, 2023, 132

[3] Denoising Pretraining for Semantic Segmentation [J].

Brempong, Emmanuel Asiedu ;

Kornblith, Simon ;

Chen, Ting ;

Parmar, Niki ;

Minderer, Matthias ;

Norouzi, Mohammad .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :4174-4185

[4]

Cai JL, 2023, Arxiv, DOI arXiv:2302.09790

[5]

Cai QY, 2024, AAAI CONF ARTIF INTE, P882

[6] Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks [J].

Cai, Yujun ;

Ge, Liuhao ;

Liu, Jun ;

Cai, Jianfei ;

Cham, Tat-Jen ;

Yuan, Junsong ;

Thalmann, Nadia Magnenat .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2272-2281

[7] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[8] DiffusionDet: Diffusion Model for Object Detection [J].

Chen, Shoufa ;

Sun, Peize ;

Song, Yibing ;

Luo, Ping .

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :19773-19786

[9] Anatomy-Aware 3D Human Pose Estimation With Bone-Based Pose Decomposition [J].

Chen, Tianlang ;

Fang, Chen ;

Shen, Xiaohui ;

Zhu, Yiheng ;

Chen, Zhili ;

Luo, Jiebo .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) :198-209

[10] Cascaded Pyramid Network for Multi-Person Pose Estimation [J].

Chen, Yilun ;

Wang, Zhicheng ;

Peng, Yuxiang ;

Zhang, Zhiqiang ;

Yu, Gang ;

Sun, Jian .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7103-7112

← 1 2 3 4 5 6 7 →