Cross-view and Cross-pose Completion for 3D Human Understanding

被引:0
|
作者
Armando, Matthieu [1 ]
Galaaoui, Salma [1 ]
Baradel, Fabien [1 ]
Lucas, Thomas [1 ]
Leroy, Vincent [1 ]
Bregier, Romain [1 ]
Weinzaepfel, Philippe [1 ]
Rogez, Gregory [1 ]
机构
[1] NAVER LABS Europe, Meylan, France
关键词
D O I
10.1109/CVPR52733.2024.00150
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, collecting domain-specific ground truth such as 2D or 3D labels does not scale well. Therefore, we propose a pre-training approach based on self-supervised learning that works on human-centric data using only images. Our method uses pairs of images of humans: the first is partially masked and the model is trained to reconstruct the masked parts given the visible ones and a second image. It relies on both stereoscopic (cross-view) pairs, and temporal (cross-pose) pairs taken from videos, in order to learn priors about 3D as well as human motion. We pre-train a model for body-centric tasks and one for hand-centric tasks. With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and obtain state-of-the-art performance for instance when fine-tuning for model-based and model-free human mesh recovery.
引用
收藏
页码:1512 / 1523
页数:12
相关论文
共 50 条
  • [31] ACDet: Attentive Cross-view Fusion for LiDAR-based 3D Object Detection
    Xu, Jiaolong
    Wang, Guojun
    Zhang, Xiao
    Wan, Guowei
    2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 74 - 83
  • [32] Volleyball Motion Analysis Model Based on GCN and Cross-View 3D Posture Tracking
    Han, Hongsi
    Chang, Jinming
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (10) : 804 - 815
  • [33] SliceMatch: Geometry-guided Aggregation for Cross-View Pose Estimation
    Lentsch, Ted
    Xia, Zimin
    Caesar, Holger
    Kooij, Julian F. P.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17225 - 17234
  • [34] UAV Pose Estimation using Cross-view Geolocalization with Satellite Imagery
    Shetty, Akshay
    Gao, Grace Xingxin
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 1827 - 1833
  • [35] VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
    Deng, Shengheng
    Liang, Zhihao
    Sun, Lin
    Jia, Kui
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8438 - 8447
  • [36] 3D hypothesis clustering for cross-view matching in multi-person motion capture
    Miaopeng Li
    Zimeng Zhou
    Xinguo Liu
    Computational Visual Media, 2020, 6 : 147 - 156
  • [37] 3D hypothesis clustering for cross-view matching in multi-person motion capture
    Li, Miaopeng
    Zhou, Zimeng
    Liu, Xinguo
    COMPUTATIONAL VISUAL MEDIA, 2020, 6 (02) : 147 - 156
  • [38] XVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for Autonomous Driving
    Song, Zijian
    Bi, Huikun
    Zhang, Ruisi
    Mao, Tianlu
    Wang, Zhaoqi
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 298 - 308
  • [39] Face Frontalization for Cross-Pose Facial Expression Recognition
    Engin, Deniz
    Ecabert, Christophe
    Ekenel, Hazim Kemal
    Thiran, Jean-Philippe
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1795 - 1799
  • [40] View Invariant 3D Human Pose Estimation
    Wei, Guoqiang
    Lan, Cuiling
    Zeng, Wenjun
    Chen, Zhibo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4601 - 4610