Cross-view and Cross-pose Completion for 3D Human Understanding

被引：0

作者：

Armando, Matthieu ^{[1
]}

Galaaoui, Salma ^{[1
]}

Baradel, Fabien ^{[1
]}

Lucas, Thomas ^{[1
]}

Leroy, Vincent ^{[1
]}

Bregier, Romain ^{[1
]}

Weinzaepfel, Philippe ^{[1
]}

Rogez, Gregory ^{[1
]}

机构：

[1] NAVER LABS Europe, Meylan, France

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024 | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.00150

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, collecting domain-specific ground truth such as 2D or 3D labels does not scale well. Therefore, we propose a pre-training approach based on self-supervised learning that works on human-centric data using only images. Our method uses pairs of images of humans: the first is partially masked and the model is trained to reconstruct the masked parts given the visible ones and a second image. It relies on both stereoscopic (cross-view) pairs, and temporal (cross-pose) pairs taken from videos, in order to learn priors about 3D as well as human motion. We pre-train a model for body-centric tasks and one for hand-centric tasks. With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and obtain state-of-the-art performance for instance when fine-tuning for model-based and model-free human mesh recovery.

引用

页码：1512 / 1523

页数：12

共 50 条

[31] ACDet: Attentive Cross-view Fusion for LiDAR-based 3D Object Detection
Xu, Jiaolong
Wang, Guojun
Zhang, Xiao
Wan, Guowei
2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 74 - 83
[32] Volleyball Motion Analysis Model Based on GCN and Cross-View 3D Posture Tracking
Han, Hongsi
Chang, Jinming
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (10) : 804 - 815
[33] SliceMatch: Geometry-guided Aggregation for Cross-View Pose Estimation
Lentsch, Ted
Xia, Zimin
Caesar, Holger
Kooij, Julian F. P.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17225 - 17234
[34] UAV Pose Estimation using Cross-view Geolocalization with Satellite Imagery
Shetty, Akshay
Gao, Grace Xingxin
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 1827 - 1833
[35] VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
Deng, Shengheng
Liang, Zhihao
Sun, Lin
Jia, Kui
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8438 - 8447
[36] 3D hypothesis clustering for cross-view matching in multi-person motion capture
Miaopeng Li
Zimeng Zhou
Xinguo Liu
Computational Visual Media, 2020, 6 : 147 - 156
[37] 3D hypothesis clustering for cross-view matching in multi-person motion capture
Li, Miaopeng
Zhou, Zimeng
Liu, Xinguo
COMPUTATIONAL VISUAL MEDIA, 2020, 6 (02) : 147 - 156
[38] XVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for Autonomous Driving
Song, Zijian
Bi, Huikun
Zhang, Ruisi
Mao, Tianlu
Wang, Zhaoqi
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 298 - 308
[39] Face Frontalization for Cross-Pose Facial Expression Recognition
Engin, Deniz
Ecabert, Christophe
Ekenel, Hazim Kemal
Thiran, Jean-Philippe
2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1795 - 1799
[40] View Invariant 3D Human Pose Estimation
Wei, Guoqiang
Lan, Cuiling
Zeng, Wenjun
Chen, Zhibo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4601 - 4610

← 1 2 3 4 5 →