MixPose: 3D Human Pose Estimation with Mixed Encoder

被引:0
作者
Cheng, Jisheng [1 ,2 ,3 ]
Cheng, Qin [1 ,3 ]
Yang, Mengjie [4 ]
Liu, Zhen [1 ,3 ]
Zhang, Qieshi [1 ,3 ]
Cheng, Jun [1 ,3 ]
机构
[1] Chinese Acad Sci, Guangdong Prov Key Lab Robot & Intelligent Syst, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[2] Univ Chinese Acad Beijing, Beijing, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Shine Technol Co Ltd, Beijing, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII | 2024年 / 14432卷
基金
中国国家自然科学基金;
关键词
3D human pose estimation; Transformer; Mixed encoder;
D O I
10.1007/978-981-99-8543-2_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The fusion of spatio-temporal information is crucial for 3D human pose estimation in video. Existing methods usually extract temporal information from the spatially encoded poses, which may lead to limited spatio-temporal information interaction. To address this issue, we propose MixPose, a novel network for 3D human pose estimation with mixed encoder in videos. We introduce independent mixed encoders to fuse spatio-temporal information in the sequence, and augment the perception of each point with global information using an attention module. We evaluate MixPose on two public datasets, Human3.6M and HumanEva, experiment results show that MixPose outperforms other state-of-the-art methods in specific scenarios.
引用
收藏
页码:353 / 364
页数:12
相关论文
共 50 条
  • [21] SCGFormer: Semantic Chebyshev Graph Convolution Transformer for 3D Human Pose Estimation
    Liang, Jiayao
    Yin, Mengxiao
    APPLIED SCIENCES-BASEL, 2024, 14 (04):
  • [22] Multi-hop graph transformer network for 3D human pose estimation
    Islam, Zaedul
    Ben Hamza, A.
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101
  • [23] A Novel Auxiliary Task Framework in 3D Human Pose Estimation for Opera Videos
    Cai, Xingquan
    Zhang, Haoyu
    He, Shanshan
    Song, Haoyu
    Sun, Haiyan
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 202 - 210
  • [24] Bidirectional temporal feature for 3D human pose and shape estimation from a video
    Sun, Libo
    Tang, Ting
    Qu, Yuke
    Qin, Wenhu
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
  • [25] Frame-Padded Multiscale Transformer for Monocular 3D Human Pose Estimation
    Zhong, Yuanhong
    Yang, Guangxia
    Zhong, Daidi
    Yang, Xun
    Wang, Shanshan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6191 - 6201
  • [26] STRFormer: Spatial-Temporal-ReTemporal Transformer for 3D human pose estimation
    Liu, Xing
    Tang, Hao
    IMAGE AND VISION COMPUTING, 2023, 140
  • [27] HOGFormer: high-order graph convolution transformer for 3D human pose estimation
    Xie, Yuhong
    Hong, Chaoqun
    Zhuang, Weiwei
    Liu, Lijuan
    Li, Jie
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (01) : 599 - 610
  • [28] Multi-scale Feature Injection for Occluded 3D Human Pose and Shape Estimation
    Shi, Yunhui
    Ge, Yangyang
    Wang, Jin
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 4881 - 4886
  • [29] Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in video
    Xu, Feiyi
    Wang, Jifan
    Sun, Ying
    Qi, Jin
    Dong, Zhenjiang
    Sun, Yanfei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [30] Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation
    Zhou, Kangkang
    Zhang, Lijun
    Lu, Feng
    Zhou, Xiang-Dong
    Shi, Yu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7512 - 7520