Bidirectional temporal feature for 3D human pose and shape estimation from a video

被引:9
作者
Sun, Libo [1 ,2 ]
Tang, Ting [1 ]
Qu, Yuke [1 ]
Qin, Wenhu [1 ,2 ]
机构
[1] Southeast Univ, Sch Instrument Sci & Engn, Nanjing, Peoples R China
[2] Southeast Univ, Sch Instrument Sci & Engn, Nanjing 210096, Peoples R China
关键词
Bi-LSTM; human pose and shape estimation; transformer;
D O I
10.1002/cav.2187
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
3D human pose and shape estimation is the foundation of analyzing human motion. However, estimating accurate and temporally consistent 3D human motion from a video remains a challenge. By now, most of the video-based methods for estimating 3D human pose and shape rely on unidirectional temporal features and lack more comprehensive information. To solve this problem, we propose a novel model "bidirectional temporal feature for human motion recovery" (BTMR), which consists of a human motion generator and a discriminator. The transformer-based generator effectively captures the forward and reverse temporal features to enhance the temporal correlation between frames and reduces the loss of spatial information. The motion discriminator based on Bi-LSTM can distinguish whether the generated pose sequences are consistent with the realistic sequences of the AMASS dataset. In the process of continuous generation and discrimination, the model can learn more realistic and accurate poses. We evaluate our BTMR on 3DPW and MPI-INF-3DHP datasets. Without the training set of 3DPW, BTMR outperforms VIBE by 2.4 mm and 14.9 mm/s(2) in PA-MPJPE and Accel metrics and outperforms TCMR by 1.7 mm in PA-MPJPE metric on 3DPW. The results demonstrate that our BTMR produces better accurate and temporal consistent 3D human motion.
引用
收藏
页数:13
相关论文
共 36 条
[1]   Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image [J].
Bogo, Federica ;
Kanazawa, Angjoo ;
Lassner, Christoph ;
Gehler, Peter ;
Romero, Javier ;
Black, Michael J. .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :561-578
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]  
Carl D., P 33 C NEUR INF PROC
[4]   Self-attentive 3D human pose and shape estimation from videos [J].
Chen, Yun-Chun ;
Piccirilli, Marco ;
Piramuthu, Robinson ;
Yang, Ming-Hsuan .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 213
[5]   Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video [J].
Choi, Hongsuk ;
Moon, Gyeongsik ;
Chang, Ju Yong ;
Lee, Kyoung Mu .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :1964-1973
[6]   Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose [J].
Choi, Hongsuk ;
Moon, Gyeongsik ;
Lee, Kyoung Mu .
COMPUTER VISION - ECCV 2020, PT VII, 2020, 12352 :769-787
[7]  
Dosovitskiy A., P 2020 IEEE CVF C CO
[8]   Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J].
Graves, A ;
Schmidhuber, J .
NEURAL NETWORKS, 2005, 18 (5-6) :602-610
[9]   Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation [J].
Guan, Shanyan ;
Xu, Jingwei ;
He, Michelle Zhang ;
Wang, Yunbo ;
Ni, Bingbing ;
Yang, Xiaokang .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) :5070-5086
[10]   ARCH: Animatable Reconstruction of Clothed Humans [J].
Huang, Zeng ;
Xu, Yuanlu ;
Lassner, Christoph ;
Li, Hao ;
Tung, Tony .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3090-3099