Self-Supervised 3D Representation Learning of Dressed Humans From Social Media Videos

被引：0

作者：

Jafarian, Yasamin ^{[1
]}

Park, Hyun Soo ^{[1
]}

机构：

[1] Univ Minnesota, Minneapolis, MN 55455 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 07期

关键词：

Depth estimation; dataset; high fidelity human reconstruction; normal estimation; single view 3D reconstruction; self-supervised learning;

D O I：

10.1109/TPAMI.2022.3231558

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A key challenge of learning a visual representation for the 3D high fidelity geometry of dressed humans lies in the limited availability of the ground truth data (e.g., 3D scanned models), which results in the performance degradation of 3D human reconstruction when applying to real-world imagery. We address this challenge by leveraging a new data resource: a number of social media dance videos that span diverse appearance, clothing styles, performances, and identities. Each video depicts dynamic movements of the body and clothes of a single person while lacking the 3D ground truth geometry. To learn a visual representation from these videos, we present a new self-supervised learning method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant. This allows self-supervision by enforcing a temporal coherence over the predictions. In addition, we jointly learn the depths along with the surface normals that are highly responsive to local texture, wrinkle, and shade by maximizing their geometric consistency. Our method is end-to-end trainable, resulting in high fidelity depth estimation that predicts fine geometry faithful to the input real image. We further provide a theoretical bound of self-supervised learning via an uncertainty analysis that characterizes the performance of the self-supervised learning without training. We demonstrate that our method outperforms the state-of-the-art human depth estimation and human shape recovery approaches on both real and rendered images.

引用

页码：8969 / 8983

页数：15

共 50 条

[1] Self-supervised Secondary Landmark Detection via 3D Representation Learning
Bala, Praneet
Zimmermann, Jan
Park, Hyun Soo
Hayden, Benjamin Y.
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 1980 - 1994
[2] Self-supervised Secondary Landmark Detection via 3D Representation Learning
Praneet Bala
Jan Zimmermann
Hyun Soo Park
Benjamin Y. Hayden
International Journal of Computer Vision, 2023, 131 : 1980 - 1994
[3] Self-supervised Adversarial Masking for 3D Point Cloud Representation Learning
Szachniewicz, Michal
Kozlowski, Wojciech
Stypulkowski, Michal
Zieba, Maciej
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 156 - 168
[4] Learning Effective Geometry Representation from Videos for Self-Supervised Monocular Depth Estimation
Zhao, Hailiang
Kong, Yongyi
Zhang, Chonghao
Zhang, Haoji
Zhao, Jiansen
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (06)
[5] Self-Supervised Learning for Videos: A Survey
Schiappa, Madeline C.
Rawat, Yogesh S.
Shah, Mubarak
ACM COMPUTING SURVEYS, 2023, 55 (13S)
[6] Depth Estimation for Colonoscopy Images with Self-supervised Learning from Videos
Cheng, Kai
Ma, Yiting
Sun, Bin
Li, Yang
Chen, Xuejin
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VI, 2021, 12906 : 119 - 128
[7] SSRL: Self-Supervised Spatial-Temporal Representation Learning for 3D Action Recognition
Jin, Zhihao
Wang, Yifan
Wang, Qicong
Shen, Yehu
Meng, Hongying
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 274 - 285
[8] Self-Supervised Learning of Detailed 3D Face Reconstruction
Chen, Yajing
Wu, Fanzi
Wang, Zeyu
Song, Yibing
Ling, Yonggen
Bao, Linchao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8696 - 8705
[9] 3D Human Pose Machines with Self-Supervised Learning
Wang, Keze
Lin, Liang
Jiang, Chenhan
Qian, Chen
Wei, Pengxu
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) : 1069 - 1082
[10] Self-Supervised Audio-Visual Representation Learning for in-the-wild Videos
Feng, Zishun
Tu, Ming
Xia, Rui
Wang, Yuxuan
Krishnamurthy, Ashok
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5671 - 5672

← 1 2 3 4 5 →