Cross-view and Cross-pose Completion for 3D Human Understanding

被引:0
|
作者
Armando, Matthieu [1 ]
Galaaoui, Salma [1 ]
Baradel, Fabien [1 ]
Lucas, Thomas [1 ]
Leroy, Vincent [1 ]
Bregier, Romain [1 ]
Weinzaepfel, Philippe [1 ]
Rogez, Gregory [1 ]
机构
[1] NAVER LABS Europe, Meylan, France
关键词
D O I
10.1109/CVPR52733.2024.00150
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, collecting domain-specific ground truth such as 2D or 3D labels does not scale well. Therefore, we propose a pre-training approach based on self-supervised learning that works on human-centric data using only images. Our method uses pairs of images of humans: the first is partially masked and the model is trained to reconstruct the masked parts given the visible ones and a second image. It relies on both stereoscopic (cross-view) pairs, and temporal (cross-pose) pairs taken from videos, in order to learn priors about 3D as well as human motion. We pre-train a model for body-centric tasks and one for hand-centric tasks. With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and obtain state-of-the-art performance for instance when fine-tuning for model-based and model-free human mesh recovery.
引用
收藏
页码:1512 / 1523
页数:12
相关论文
共 50 条
  • [41] A cross-feature interaction network for 3D human pose estimation
    Peng, Jihua
    Zhou, Yanghong
    Mok, P. Y.
    PATTERN RECOGNITION LETTERS, 2025, 189 : 175 - 181
  • [42] Semantic Cross-View Matching
    Castaldo, Francesco
    Zamir, Amir
    Angst, Roland
    Palmieri, Francesco
    Savarese, Silvio
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 1044 - 1052
  • [43] Recursive Cross-View: Use Only 2D Detectors to Achieve 3D Object Detection Without 3D Annotations
    Shun, Gui
    Yan, Luximon
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6659 - 6666
  • [44] Cross-view Convolutional Networks
    Jacobs, Nathan
    Workman, Scott
    Zhai, Menghua
    2016 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2016,
  • [45] Cross-View kernel transfer
    Huusari, Riikka
    Capponi, Cecile
    Villoutreix, Paul
    Kadri, Hachem
    PATTERN RECOGNITION, 2022, 129
  • [46] Cross-View Image Geolocalization
    Lin, Tsung-Yi
    Belongie, Serge
    Hays, James
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 891 - 898
  • [47] Cross-Pose Face Recognition - A Virtual View Generation Approach Using Clustering Based LVTM
    Li, Xi
    Takahashi, Tomokazu
    Deguchi, Daisuke
    Ide, Ichiro
    Murase, Hiroshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (03): : 531 - 537
  • [48] Cross-view SLAM solver: Global pose estimation of monocular ground-level video frames for 3D reconstruction using a reference 3D model from satellite images
    Elhashash, Mostafa
    Qin, Rongjun
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 188 : 62 - 74
  • [49] LEARNING ASSOCIATE APPEARANCE MANIFOLDS FOR CROSS-POSE FACE RECOGNITION
    Chen, Xue
    Wang, Chunheng
    Xiao, Baihua
    Cai, Xinyuan
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1907 - 1911
  • [50] Cross-view action recognition understanding from exocentric to egocentric perspective
    Truong, Thanh-Dat
    Luu, Khoa
    NEUROCOMPUTING, 2025, 614