Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation

被引：11

作者：

Honari, Sina ^{[1
]}

Constantin, Victor ^{[1
]}

Rhodin, Helge ^{[2
]}

Salzmann, Mathieu ^{[1
]}

Fua, Pascal ^{[1
]}

机构：

[1] CVLab, EPFL, Lausanne, Switzerland

[2] Imager Lab, UBC, Vancouver, BC, Canada

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 05期

关键词：

Crops; Feature extraction; TV; Cameras; Three-dimensional displays; Image reconstruction; Unsupervised learning; Temporal feature extraction; unsupervised representation learning; contrastive learning; 3D human pose;

D O I：

10.1109/TPAMI.2022.3215307

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features, well-suited for human pose estimation. Our approach reduces error by about 50% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques. When 2D pose is available, our approach can extract even richer latent features and improve the 3D pose estimation accuracy, outperforming other state-of-the-art weakly supervised methods.

引用

页码：6415 / 6427

页数：13

共 50 条

[1] On the Effect of Temporal Information on Monocular 3D Human Pose Estimation
Brauer, Juergen
Gong, Wenjuan
Gonzalez, Jordi
Arens, Michael
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
[2] LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION
Chen, Ziyi
Sugimoto, Akihiro
Lai, Shang-Hong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4218 - 4222
[3] A survey on monocular 3D human pose estimation
Ji X.
Fang Q.
Dong J.
Shuai Q.
Jiang W.
Zhou X.
Virtual Reality and Intelligent Hardware, 2020, 2 (06): : 471 - 500
[4] Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos
Cheng, Yu
Wang, Bo
Yang, Bo
Tan, Robby T.
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1157 - 1165
[5] MONOCULAR 3D HUMAN POSE ESTIMATION BY CLASSIFICATION
Greif, Thomas
Lienhart, Rainer
Sengupta, Debabrata
2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
[6] On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos
Li, Zhi
Wang, Xuan
Wang, Fei
Jiang, Peilin
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2192 - 2201
[7] SPATIO-TEMPORAL ATTENTION GRAPH FOR MONOCULAR 3D HUMAN POSE ESTIMATION
Zhang, Lijun
Shao, Xiaohu
Li, Zhenghao
Zhou, Xiang-Dong
Shi, Yu
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1231 - 1235
[8] Adapted human pose: monocular 3D human pose estimation with zero real 3D pose data
Liu, Shuangjun
Sehgal, Naveen
Ostadabbas, Sarah
APPLIED INTELLIGENCE, 2022, 52 (12) : 14491 - 14506
[9] Learning to Augment Poses for 3D Human Pose Estimation in Images and Videos
Zhang, Jianfeng
Gong, Kehong
Wang, Xinchao
Feng, Jiashi
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 10012 - 10026
[10] Adapted human pose: monocular 3D human pose estimation with zero real 3D pose data
Shuangjun Liu
Naveen Sehgal
Sarah Ostadabbas
Applied Intelligence, 2022, 52 : 14491 - 14506

← 1 2 3 4 5 →