Cross-view and Cross-pose Completion for 3D Human Understanding

被引:0
|
作者
Armando, Matthieu [1 ]
Galaaoui, Salma [1 ]
Baradel, Fabien [1 ]
Lucas, Thomas [1 ]
Leroy, Vincent [1 ]
Bregier, Romain [1 ]
Weinzaepfel, Philippe [1 ]
Rogez, Gregory [1 ]
机构
[1] NAVER LABS Europe, Meylan, France
关键词
D O I
10.1109/CVPR52733.2024.00150
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, collecting domain-specific ground truth such as 2D or 3D labels does not scale well. Therefore, we propose a pre-training approach based on self-supervised learning that works on human-centric data using only images. Our method uses pairs of images of humans: the first is partially masked and the model is trained to reconstruct the masked parts given the visible ones and a second image. It relies on both stereoscopic (cross-view) pairs, and temporal (cross-pose) pairs taken from videos, in order to learn priors about 3D as well as human motion. We pre-train a model for body-centric tasks and one for hand-centric tasks. With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and obtain state-of-the-art performance for instance when fine-tuning for model-based and model-free human mesh recovery.
引用
收藏
页码:1512 / 1523
页数:12
相关论文
共 50 条
  • [21] A Strong Geometric Baseline for Cross-View Matching of Multi-person 3D Pose Estimation from Multi-view Images
    Dehaeck, Sam
    Domken, Corentin
    Bey-Temsamani, Abdellatif
    Abedrabbo, Gabriel
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 77 - 88
  • [22] Combining View-based Pose Normalization and Feature Transform for Cross-Pose Face Recognition
    Gao, Hua
    Ekenel, Hazim Kemal
    Stiefelhagen, Rainer
    2015 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2015, : 487 - 492
  • [23] 3D Pose from Motion for Cross-view Action Recognition via Non-linear Circulant Temporal Encoding
    Gupta, Ankur
    Martinez, Julieta
    Little, James J.
    Woodham, Robert J.
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2601 - 2608
  • [24] Human Parsing Based Texture Transfer from Single Image to 3D Human via Cross-View Consistency
    Zhao, Fang
    Liao, Shengcai
    Zhang, Kaihao
    Shao, Ling
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [25] Cross-View Person Identification Based on Confidence-Weighted Human Pose Matching
    Liang, Guoqiang
    Lan, Xuguang
    Chen, Xingyu
    Zheng, Kang
    Wang, Song
    Zheng, Nanning
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (08) : 3821 - 3835
  • [26] CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion
    Dong, Haotian
    Ma, Enhui
    Wang, Lubo
    Wang, Miaohui
    Xie, Wuyuan
    Guo, Qing
    Li, Ping
    Liang, Lingyu
    Yang, Kairui
    Lin, Di
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8840 - 8849
  • [27] Towards Stable Human Pose Estimation via Cross-View Fusion and Foot Stabilization
    Zhuo, Li'an
    Cao, Jian
    Wang, Qi
    Zhang, Bang
    Bo, Liefeng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 650 - 659
  • [28] CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection
    Tseng, Ching-Yu
    Chen, Yi-Rong
    Lee, Hsin-Ying
    Wu, Tsung-Han
    Chen, Wen-Chin
    Hsu, Winston H.
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 4850 - 4857
  • [29] Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization
    Zhao, Long
    Wang, Yuxiao
    Zhao, Jiaping
    Yuan, Liangzhe
    Sun, Jennifer J.
    Schroff, Florian
    Adam, Hartwig
    Peng, Xi
    Metaxas, Dimitris
    Liu, Ting
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12788 - 12797
  • [30] A Baseline for Cross-Database 3D Human Pose Estimation
    Rapczynski, Michal
    Werner, Philipp
    Handrich, Sebastian
    Al-Hamadi, Ayoub
    SENSORS, 2021, 21 (11)