Mutual information-driven self-supervised point cloud pre-training

被引:0
|
作者
Xu, Weichen [1 ]
Fu, Tianhao [1 ]
Cao, Jian [1 ]
Zhao, Xinyu [1 ]
Xu, Xinxin [1 ]
Cao, Xixin [1 ]
Zhang, Xing [1 ,2 ]
机构
[1] Peking Univ, Sch Software & Microelect, Beijing 100871, Peoples R China
[2] Peking Univ, Shenzhen Grad Sch, Key Lab Integrated Microsyst, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised learning; Autonomous driving; Point cloud scene understanding; Mutual information; High-level features; OPTIMIZATION;
D O I
10.1016/j.knosys.2024.112741
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning universal representations from unlabeled 3D point clouds is essential to improve the generalization and safety of autonomous driving. Generative self-supervised point cloud pre-training with low-level features as pretext tasks is a mainstream paradigm. However, from the perspective of mutual information, this approach is constrained by spatial information and entangled representations. In this study, we propose a generalized generative self-supervised point cloud pre-training framework called GPICTURE. High-level features were used as an additional pretext task to enhance the understanding of semantic information. Considering the varying difficulties caused by the discrimination of voxel features, we designed inter-class and intra-class discrimination-guided masking (I2Mask) to set the masking ratio adaptively. Furthermore, to ensure a hierarchical and stable reconstruction process, centered kernel alignment-guided hierarchical reconstruction and differential-gated progressive learning were employed to control multiple reconstruction tasks. Complete theoretical analyses demonstrated that high-level features can enhance the mutual information between latent features and high-level features, as well as the input point cloud. On Waymo, nuScenes, and SemanticKITTI, we achieved a 75.55% mAP for 3D object detection, 79.7% mIoU for 3D semantic segmentation, and 18.8% mIoU for occupancy prediction. Specifically, with only 50% of the fine-tuning data required, the performance of GPICURE was close to that of training from scratch with 100% of the fine-tuning data. In addition, consistent visualization with downstream tasks and a 57% reduction in weight disparity demonstrated a better fine-tuning starting point. The project page is hosted at https://gpicture-page.github.io/.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Self-Supervised Pre-Training with Bridge Neural Network for SAR-Optical Matching
    Qian, Lixin
    Liu, Xiaochun
    Huang, Meiyu
    Xiang, Xueshuang
    REMOTE SENSING, 2022, 14 (12)
  • [32] Single-atom catalysts property prediction via Supervised and Self-Supervised pre-training models
    Wang, Lanjing
    Chen, Honghao
    Yang, Longqi
    Li, Jiali
    Li, Yong
    Wang, Xiaonan
    CHEMICAL ENGINEERING JOURNAL, 2024, 487
  • [33] PerFedRec plus plus : Enhancing Personalized Federated Recommendation with Self-Supervised Pre-Training
    Luo, Sichun
    Xiao, Yuanzhang
    Zhang, Xinyi
    Liu, Yang
    Ding, Wenbo
    Song, Linqi
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (05)
  • [34] Self-Supervised Pre-Training for 3-D Roof Reconstruction on LiDAR Data
    Yang, Hongxin
    Huang, Shangfeng
    Wang, Ruisheng
    Wang, Xin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [35] Self-supervised multimodal reconstruction pre-training for retinal computer-aided diagnosis
    Hervella, Alvaro S.
    Rouco, Jose
    Novo, Jorge
    Ortega, Marcos
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 185
  • [36] Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech
    Lin, Jingru
    Ge, Meng
    Wang, Wupeng
    Li, Haizhou
    Feng, Mengling
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1014 - 1018
  • [37] LPCL: Localized prominence contrastive learning for self-supervised dense visual pre-training
    Chen, Zihan
    Zhu, Hongyuan
    Cheng, Hao
    Mi, Siya
    Zhang, Yu
    Geng, Xin
    PATTERN RECOGNITION, 2023, 135
  • [38] A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision
    Li, Lanxiao
    Heizmann, Michael
    COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 656 - 673
  • [39] Incorporation of Iterative Self-supervised Pre-training in the Creation of the ASR System for the Tatar Language
    Khusainov, Aidar
    Suleymanov, Dzhavdet
    Muhametzyanov, Ilnur
    TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 481 - 488
  • [40] Self-supervised video representation learning by maximizing mutual information
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 88