Cross-view and Cross-pose Completion for 3D Human Understanding

被引：0

作者：

Armando, Matthieu ^{[1
]}

Galaaoui, Salma ^{[1
]}

Baradel, Fabien ^{[1
]}

Lucas, Thomas ^{[1
]}

Leroy, Vincent ^{[1
]}

Bregier, Romain ^{[1
]}

Weinzaepfel, Philippe ^{[1
]}

Rogez, Gregory ^{[1
]}

机构：

[1] NAVER LABS Europe, Meylan, France

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024 | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.00150

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, collecting domain-specific ground truth such as 2D or 3D labels does not scale well. Therefore, we propose a pre-training approach based on self-supervised learning that works on human-centric data using only images. Our method uses pairs of images of humans: the first is partially masked and the model is trained to reconstruct the masked parts given the visible ones and a second image. It relies on both stereoscopic (cross-view) pairs, and temporal (cross-pose) pairs taken from videos, in order to learn priors about 3D as well as human motion. We pre-train a model for body-centric tasks and one for hand-centric tasks. With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and obtain state-of-the-art performance for instance when fine-tuning for model-based and model-free human mesh recovery.

引用

页码：1512 / 1523

页数：12

共 50 条

[21] A Strong Geometric Baseline for Cross-View Matching of Multi-person 3D Pose Estimation from Multi-view Images
Dehaeck, Sam
Domken, Corentin
Bey-Temsamani, Abdellatif
Abedrabbo, Gabriel
IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 77 - 88
[22] Combining View-based Pose Normalization and Feature Transform for Cross-Pose Face Recognition
Gao, Hua
Ekenel, Hazim Kemal
Stiefelhagen, Rainer
2015 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2015, : 487 - 492
[23] 3D Pose from Motion for Cross-view Action Recognition via Non-linear Circulant Temporal Encoding
Gupta, Ankur
Martinez, Julieta
Little, James J.
Woodham, Robert J.
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2601 - 2608
[24] Human Parsing Based Texture Transfer from Single Image to 3D Human via Cross-View Consistency
Zhao, Fang
Liao, Shengcai
Zhang, Kaihao
Shao, Ling
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[25] Cross-View Person Identification Based on Confidence-Weighted Human Pose Matching
Liang, Guoqiang
Lan, Xuguang
Chen, Xingyu
Zheng, Kang
Wang, Song
Zheng, Nanning
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (08) : 3821 - 3835
[26] CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion
Dong, Haotian
Ma, Enhui
Wang, Lubo
Wang, Miaohui
Xie, Wuyuan
Guo, Qing
Li, Ping
Liang, Lingyu
Yang, Kairui
Lin, Di
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8840 - 8849
[27] Towards Stable Human Pose Estimation via Cross-View Fusion and Foot Stabilization
Zhuo, Li'an
Cao, Jian
Wang, Qi
Zhang, Bang
Bo, Liefeng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 650 - 659
[28] CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection
Tseng, Ching-Yu
Chen, Yi-Rong
Lee, Hsin-Ying
Wu, Tsung-Han
Chen, Wen-Chin
Hsu, Winston H.
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 4850 - 4857
[29] Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization
Zhao, Long
Wang, Yuxiao
Zhao, Jiaping
Yuan, Liangzhe
Sun, Jennifer J.
Schroff, Florian
Adam, Hartwig
Peng, Xi
Metaxas, Dimitris
Liu, Ting
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12788 - 12797
[30] A Baseline for Cross-Database 3D Human Pose Estimation
Rapczynski, Michal
Werner, Philipp
Handrich, Sebastian
Al-Hamadi, Ayoub
SENSORS, 2021, 21 (11)

← 1 2 3 4 5 →