MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition

被引:11
作者
Wang, Luequan [1 ]
Xu, Hongbin [1 ]
Kang, Wenxiong [1 ]
机构
[1] South China Univ Technol, Sch Automat Sci & Engn, Guangzhou 510641, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi view; unsupervised pretraining; contrastive learning; 3D vision; shape recognition;
D O I
10.1007/s11633-023-1430-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
3D shape recognition has drawn much attention in recent years. The view-based approach performs best of all. However, the current multi-view methods are almost all fully supervised, and the pretraining models are almost all based on ImageNet. Although the pretraining results of ImageNet are quite impressive, there is still a significant discrepancy between multi-view datasets and ImageNet. Multi-view datasets naturally retain rich 3D information. In addition, large-scale datasets such as ImageNet require considerable cleaning and annotation work, so it is difficult to regenerate a second dataset. In contrast, unsupervised learning methods can learn general feature representations without any extra annotation. To this end, we propose a three-stage unsupervised joint pretraining model. Specifically, we decouple the final representations into three fine-grained representations. Data augmentation is utilized to obtain pixel-level representations within each view. And we boost the spatial invariant features from the view level. Finally, we exploit global information at the shape level through a novel extract-and-swap module. Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks, and shows generalization to cross-dataset tasks.
引用
收藏
页码:872 / 883
页数:12
相关论文
共 50 条
  • [21] Track initialization and re-identification for 3D multi-view multi-object tracking
    Van Ma, Linh
    Nguyen, Tran Thien Dat
    Vo, Ba-Ngu
    Jang, Hyunsung
    Jeon, Moongu
    INFORMATION FUSION, 2024, 111
  • [22] 3D Point Cloud Recognition Based on a Multi-View Convolutional Neural Network
    Zhang, Le
    Sun, Jian
    Zheng, Qiang
    SENSORS, 2018, 18 (11)
  • [23] A COMPACT 3D REPRESENTATION FOR MULTI-VIEW VIDEO
    Salvador, Jordi
    Casas, Josep R.
    INTERNATIONAL CONFERENCE ON 3D IMAGING 2011 (IC3D 2011), 2011,
  • [24] A method of multi-view intraoral 3D measurement
    Zhao, Huijie
    Wang, Zhen
    Jiang, Hongzhi
    Xu, Yang
    Lv, Peijun
    Sun, Yunchun
    INTERNATIONAL CONFERENCE ON PHOTONICS AND OPTICAL ENGINEERING (ICPOE 2014), 2015, 9449
  • [25] 3D object detection based on DST fusion multi-view fuzzy reasoning assignment
    Zhang C.-F.
    Li C.-W.-L.
    Zou Y.-Q.
    Jin N.
    Kongzhi yu Juece/Control and Decision, 2021, 36 (04): : 867 - 875
  • [26] VFMVAC: View-filtering-based multi-view aggregating convolution for 3D shape recognition and retrieval
    Liu, Zehua
    Zhang, Yuhe
    Gao, Jian
    Wang, Shurui
    PATTERN RECOGNITION, 2022, 129
  • [27] MHFP: Multi-view based hierarchical fusion pooling method for 3D shape recognition
    Liang, Qi
    Li, Qiang
    Zhang, Lihu
    Mi, Haixiao
    Nie, Weizhi
    Li, Xuanya
    PATTERN RECOGNITION LETTERS, 2021, 150 : 214 - 220
  • [28] PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition
    You, Haoxuan
    Feng, Yifan
    Ji, Rongrong
    Gao, Yue
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1310 - 1318
  • [29] MVTN: Learning Multi-view Transformations for 3D Understanding
    Hamdi, Abdullah
    AlZahrani, Faisal
    Giancola, Silvio
    Ghanem, Bernard
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (04) : 2197 - 2226
  • [30] OPTIMIZING CAMERA POSITIONS FOR MULTI-VIEW 3D RECONSTRUCTION
    Qian, Ningqing
    Lo, Chao-Yang
    2015 INTERNATIONAL CONFERENCE ON 3D IMAGING (IC3D), 2015,