Utilizing motion segmentation for optimizing the temporal adjacency matrix in 3D human pose estimation

被引：0

作者：

Wang, Yingfeng ^{[1
]}

Li, Muyu ^{[3
]}

Yan, Hong ^{[1
,2
]}

机构：

[1] Hong Kong Sci Pk, Ctr Intelligent Multidimens Data Anal, Hong Kong, Peoples R China

[2] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China

[3] Dalian Univ Technol, Inst Intelligent Sci & Technol, Sch Control Sci & Engn, Dalian, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 600卷

关键词：

3D human pose estimation; Temporal adjacency matrix; Motion segmentation;

D O I：

10.1016/j.neucom.2024.128153

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In monocular 3D human pose estimation, modeling the temporal relation of human joints is crucial for prediction accuracy. Currently, most methods utilize transformer to model the temporal relation among joints. However, existing transformer-based methods have limitations. The temporal adjacency matrix utilized within the self-attention of the temporal transformer inaccurately models the temporal relationships between frames, particularly in cases where distinct motions exhibit significant correlation despite having different physical interpretations and large temporal spans. To address this issue, we construct an artificial temporal adjacency matrix based on input data and introduce a temporal adjacency matrix hybrid module to blend this matrix with the model's inherent temporal adjacency matrix, resulting in a novel composite temporal adjacency matrix. Through extensive experiments on Human3.6M and MPI-INF-3DHP datasets using state-of-the-art methods as benchmarks, our proposed method demonstrates a maximum improvement of up to 5.6% compared to the original approach.

引用

页数：12

共 57 条

[1]

Ailing Zeng, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12359), P507, DOI 10.1007/978-3-030-58568-6_30

[2] Exploiting Spatial-temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks [J].

Cai, Yujun ;

Ge, Liuhao ;

Liu, Jun ;

Cai, Jianfei ;

Cham, Tat-Jen ;

Yuan, Junsong ;

Thalmann, Nadia Magnenat .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2272-2281

[3] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].

Cao, Zhe ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310

[4] Anatomy-Aware 3D Human Pose Estimation With Bone-Based Pose Decomposition [J].

Chen, Tianlang ;

Fang, Chen ;

Shen, Xiaohui ;

Zhu, Yiheng ;

Chen, Zhili ;

Luo, Jiebo .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) :198-209

[5] Cascaded Pyramid Network for Multi-Person Pose Estimation [J].

Chen, Yilun ;

Wang, Zhicheng ;

Peng, Yuxiang ;

Zhang, Zhiqiang ;

Yu, Gang ;

Sun, Jian .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7103-7112

[6] Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video [J].

Choi, Hongsuk ;

Moon, Gyeongsik ;

Chang, Ju Yong ;

Lee, Kyoung Mu .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1964-1973

[7] Optimizing Network Structure for 3D Human Pose Estimation [J].

Ci, Hai ;

Wang, Chunyu ;

Ma, Xiaoxuan ;

Wang, Yizhou .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2262-2271

[8]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[9] Probabilistic Temporal Subspace Clustering [J].

Gholami, Behnam ;

Pavlovic, Vladimir .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4313-4322

[10] DiffPose: Toward More Reliable 3D Pose Estimation [J].

Gong, Jia ;

Foo, Lin Geng ;

Fan, Zhipeng ;

Ke, Qiuhong ;

Rahmani, Hossein ;

Liu, Jun .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :13041-13051

← 1 2 3 4 5 6 →