U-shaped spatial-temporal transformer network for 3D human pose estimation

被引：7

作者：

Yang, Honghong ^{[1
,2
]}

Guo, Longfei ^{[3
]}

Zhang, Yumei ^{[1
,3
]}

Wu, Xiaojun ^{[1
,3
]}

机构：

[1] Shaanxi Normal Univ, Key Lab Modern Teaching Technol, Minist Educ, Xian 710062, Peoples R China

[2] Minist Culture & Tourism, Key Lab Intelligent Comp & Serv Technol Folk Song, Xian, Peoples R China

[3] Shaanxi Normal Univ, Sch Comp Sci, Xian 710062, Peoples R China

来源：

MACHINE VISION AND APPLICATIONS | 2022年 / 33卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Human pose estimation; Spatial-temporal transformer network; Multi-scale and multi-level feature representations; NEURAL-NETWORKS;

D O I：

10.1007/s00138-022-01334-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D human pose estimation has achieved much progress with the development of convolution neural networks. There still have some challenges to accurately estimate 3D joint locations from single-view images or videos due to depth ambiguity and severe occlusion. Motivated by the effectiveness of introducing vision transformer into computer vision tasks, we present a novel U-shaped spatial-temporal transformer-based network (U-STN) for 3D human pose estimation. The core idea of the proposed method is to process the human joints by designing a multi-scale and multi-level U-shaped transformer model. We construct a multi-scale architecture with three different scales based on the human skeletal topology, in which the local and global features are processed through three different scales with kinematic constraints. Furthermore, a multi-level feature representations is introduced by fusing intermediate features from different depths of the U-shaped network. With a skeletal constrained pooling and unpooling operations devised for U-STN, the network can transform features across different scales and extract meaningful semantic features at all levels. Experiments on two challenging benchmark datasets show that the proposed method achieves a good performance on 2D-to-3D pose estimation.

引用

页数：16

共 50 条

[1] U-shaped spatial–temporal transformer network for 3D human pose estimation
Honghong Yang
Longfei Guo
Yumei Zhang
Xiaojun Wu
Machine Vision and Applications, 2022, 33
[2] Multi-scale spatial-temporal transformer for 3D human pose estimation
Wu, Yongpeng
Gao, Junna
2021 5TH INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING (ICVISP 2021), 2021, : 242 - 247
[3] 3D Human Pose Estimation in Video with Temporal and Spatial Transformer
Peng, Sha
Hu, Jiwei
Proceedings of SPIE - The International Society for Optical Engineering, 2023, 12707
[4] Vertex position estimation with spatial-temporal transformer for 3D human reconstruction
Zhang, Xiangjun
Zheng, Yinglin
Deng, Wenjin
Dai, Qifeng
Lin, Yuxin
Shi, Wangzheng
Zeng, Ming
GRAPHICAL MODELS, 2023, 130
[5] STRFormer: Spatial-Temporal-ReTemporal Transformer for 3D human pose estimation
Liu, Xing
Tang, Hao
IMAGE AND VISION COMPUTING, 2023, 140
[6] Weakly-Supervised 3D Human Pose Estimation With Cross-View U-Shaped Graph Convolutional Network
Hua, Guoliang
Liu, Hong
Li, Wenhao
Zhang, Qian
Ding, Runwei
Xu, Xin
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1832 - 1843
[7] Kinematics-aware spatial-temporal feature transform for 3D human pose estimation
Du, Songlin
Yuan, Zhiwei
Ikenaga, Takeshi
PATTERN RECOGNITION, 2024, 150
[8] Hierarchical Spatial-Temporal Adaptive Graph Fusion for Monocular 3D Human Pose Estimation
Zhang, Lijun
Lu, Feng
Zhou, Kangkang
Zhou, Xiang-Dong
Shi, Yu
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 61 - 65
[9] 3D Human Pose Estimation with Spatial and Temporal Transformers
Zheng, Ce
Zhu, Sijie
Mendieta, Matias
Yang, Taojiannan
Chen, Chen
Ding, Zhengming
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11636 - 11645
[10] UViT: Efficient and lightweight U-shaped hybrid vision transformer for human pose estimation
Li B.
Tang S.
Li W.
Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 8345 - 8359

← 1 2 3 4 5 →