Dual-Path Transformer for 3D Human Pose Estimation

被引:6
作者
Zhou, Lu [1 ]
Chen, Yingying [1 ]
Wang, Jinqiao [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Fdn Model Res Ctr, Beijing 100190, Peoples R China
[2] Wuhan AI Res, Wuhan 430073, Peoples R China
[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China
关键词
Transformers; Three-dimensional displays; Pose estimation; Task analysis; Solid modeling; Feature extraction; Benchmark testing; 3D human pose estimation; transformer; motion; distillation;
D O I
10.1109/TCSVT.2023.3318557
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video-based 3D human pose estimation has achieved great progress, however, it is still difficult to learn precise 2D-3D projection under some hard cases. Multi-level human knowledge and motion information serve as two key elements in the field to conquer the challenges caused by various factors, where the former encodes various human structure information spatially and the latter captures the motion change temporally. Inspired by this, we propose a DualFormer (dual-path transformer) network which encodes multiple human contexts and motion detail to perform the spatial-temporal modeling. Firstly, motion information which depicts the movement change of human body is embedded to provide explicit motion prior for the transformer module. Secondly, a dual-path transformer framework is proposed to model long-range dependencies of both joint sequence and limb sequence. Parallel context embedding is performed initially and a cross transformer block is then appended to promote the interaction of the dual paths which improves the feature robustness greatly. Specifically, predictions of multiple levels can be acquired simultaneously. Lastly, we employ the weighted distillation technique to accelerate the convergence of the dual-path framework. We conduct extensive experiments on three different benchmarks, i.e., Human 3.6M, MPI-INF-3DHP and HumanEva-I. We mainly compute the MPJPE, P-MPJPE, PCK and AUC to evaluate the effectiveness of proposed approach and our work achieves competitive results compared with state-of-the-art approaches. Specifically, the MPJPE is reduced to 42.8mm which is 1.5mm lower than PoseFormer on Human3.6M, which proves the efficacy of the proposed approach.
引用
收藏
页码:3260 / 3270
页数:11
相关论文
共 66 条
  • [1] Ailing Zeng, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12359), P507, DOI 10.1007/978-3-030-58568-6_30
  • [2] Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks
    Cai, Yujun
    Ge, Liuhao
    Liu, Jun
    Cai, Jianfei
    Cham, Tat-Jen
    Yuan, Junsong
    Thalmann, Nadia Magnenat
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2272 - 2281
  • [3] Skeleton-Based Action Recognition With Gated Convolutional Neural Networks
    Cao, Congqi
    Lan, Cuiling
    Zhang, Yifan
    Zeng, Wenjun
    Lu, Hanqing
    Zhang, Yanning
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) : 3247 - 3257
  • [4] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [5] Chen GB, 2017, ADV NEUR IN, V30
  • [6] Anatomy-Aware 3D Human Pose Estimation With Bone-Based Pose Decomposition
    Chen, Tianlang
    Fang, Chen
    Shen, Xiaohui
    Zhu, Yiheng
    Chen, Zhili
    Luo, Jiebo
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 198 - 209
  • [7] Cascaded Pyramid Network for Multi-Person Pose Estimation
    Chen, Yilun
    Wang, Zhicheng
    Peng, Yuxiang
    Zhang, Zhiqiang
    Yu, Gang
    Sun, Jian
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7103 - 7112
  • [8] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [9] Fang HS, 2018, AAAI CONF ARTIF INTE, P6821
  • [10] Human 3D Pose Estimation with a Tilting Camera for Social Mobile Robot Interaction
    Garcia-Salguero, Mercedes
    Gonzalez-Jimenez, Javier
    Moreno, Francisco-Angel
    [J]. SENSORS, 2019, 19 (22)