Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in video

被引:0
作者
Xu, Feiyi [1 ]
Wang, Jifan [2 ]
Sun, Ying [2 ]
Qi, Jin [1 ]
Dong, Zhenjiang [2 ]
Sun, Yanfei [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Internet Things, 9 Wenyuan Rd, Nanjing 210023, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Sch Comp Sci, 9 Wenyuan Rd, Nanjing 210023, Jiangsu, Peoples R China
关键词
3D human pose estimation; Graph convolutional network; Transformer; Spatio-temporal feature;
D O I
10.1016/j.cviu.2024.104258
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent transformer-based methods have achieved excellent performance in 3D human pose estimation. The distinguishing characteristic of transformer lies in its equitable treatment of each token, encoding them independently. When applied to the human skeleton, transformer regards each joint as an equally significant token. This can lead to a lack of clarity in the extraction of connection relationships between joints, thus affecting the accuracy of relationship information. In addition, transformer also treats each frame of temporal sequences equally. This design can introduce a lot of redundant information in short frames with frequent action changes, which can have a negative impact on learning temporal correlations. To alleviate the above issues, we propose an end-to-end framework, a Spatio-Temporal Dynamic Interlaced Network (S-TDINet), including a dynamic spatial GCN encoder (DSGCE) and an interlaced temporal transformer encoder (ITTE). In the DSGCE module, we design three adaptive adjacency matrices to model spatial correlation from static and dynamic perspectives. In the ITTE module, we introduce a global-local interlaced mechanism to mitigate potential interference from redundant information in fast motion scenarios, thereby achieving more accurate temporal correlation modeling. Finally, we conduct extensive experiments and validate the effectiveness of our approach on two widely recognized benchmark datasets: Human3.6M and MPI-INF-3DHP.
引用
收藏
页数:10
相关论文
共 49 条
[1]   4DHumanOutfit: A multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements [J].
Armando, Matthieu ;
Boissieux, Laurence ;
Boyer, Edmond ;
Franco, Jean-Sebastien ;
Humenberger, Martin ;
Legras, Christophe ;
Leroy, Vincent ;
Marsot, Mathieu ;
Pansiot, Julien ;
Pujades, Sergi ;
Rekik, Rim ;
Rogez, Gregory ;
Swamy, Anilkumar ;
Wuhrer, Stefanie .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 237
[2]  
Beltagy I, 2020, Arxiv, DOI [arXiv:2004.05150, 10.48550/arXiv.2004.05150]
[3]   Skeleton-Based Action Recognition With Gated Convolutional Neural Networks [J].
Cao, Congqi ;
Lan, Cuiling ;
Zhang, Yifan ;
Zeng, Wenjun ;
Lu, Hanqing ;
Zhang, Yanning .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) :3247-3257
[4]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[5]   Anatomy-Aware 3D Human Pose Estimation With Bone-Based Pose Decomposition [J].
Chen, Tianlang ;
Fang, Chen ;
Shen, Xiaohui ;
Zhu, Yiheng ;
Chen, Zhili ;
Luo, Jiebo .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) :198-209
[6]   Cascaded Pyramid Network for Multi-Person Pose Estimation [J].
Chen, Yilun ;
Wang, Zhicheng ;
Peng, Yuxiang ;
Zhang, Zhiqiang ;
Yu, Gang ;
Sun, Jian .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7103-7112
[7]   Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense [J].
Chen, Yixin ;
Huang, Siyuan ;
Yuan, Tao ;
Qi, Siyuan ;
Zhu, Yixin ;
Zhu, Song-Chun .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8647-8656
[8]   Joint Hand-Object 3D Reconstruction From a Single Image With Cross-Branch Feature Fusion [J].
Chen, Yujin ;
Tu, Zhigang ;
Kang, Di ;
Chen, Ruizhi ;
Bao, Linchao ;
Zhang, Zhengyou ;
Yuan, Junsong .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :4008-4021
[9]   Optimizing Network Structure for 3D Human Pose Estimation [J].
Ci, Hai ;
Wang, Chunyu ;
Ma, Xiaoxuan ;
Wang, Yizhou .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2262-2271
[10]   RMPE: Regional Multi-Person Pose Estimation [J].
Fang, Hao-Shu ;
Xie, Shuqin ;
Tai, Yu-Wing ;
Lu, Cewu .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2353-2362