Mutual information-driven self-supervised point cloud pre-training

被引:0
|
作者
Xu, Weichen [1 ]
Fu, Tianhao [1 ]
Cao, Jian [1 ]
Zhao, Xinyu [1 ]
Xu, Xinxin [1 ]
Cao, Xixin [1 ]
Zhang, Xing [1 ,2 ]
机构
[1] Peking Univ, Sch Software & Microelect, Beijing 100871, Peoples R China
[2] Peking Univ, Shenzhen Grad Sch, Key Lab Integrated Microsyst, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised learning; Autonomous driving; Point cloud scene understanding; Mutual information; High-level features; OPTIMIZATION;
D O I
10.1016/j.knosys.2024.112741
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning universal representations from unlabeled 3D point clouds is essential to improve the generalization and safety of autonomous driving. Generative self-supervised point cloud pre-training with low-level features as pretext tasks is a mainstream paradigm. However, from the perspective of mutual information, this approach is constrained by spatial information and entangled representations. In this study, we propose a generalized generative self-supervised point cloud pre-training framework called GPICTURE. High-level features were used as an additional pretext task to enhance the understanding of semantic information. Considering the varying difficulties caused by the discrimination of voxel features, we designed inter-class and intra-class discrimination-guided masking (I2Mask) to set the masking ratio adaptively. Furthermore, to ensure a hierarchical and stable reconstruction process, centered kernel alignment-guided hierarchical reconstruction and differential-gated progressive learning were employed to control multiple reconstruction tasks. Complete theoretical analyses demonstrated that high-level features can enhance the mutual information between latent features and high-level features, as well as the input point cloud. On Waymo, nuScenes, and SemanticKITTI, we achieved a 75.55% mAP for 3D object detection, 79.7% mIoU for 3D semantic segmentation, and 18.8% mIoU for occupancy prediction. Specifically, with only 50% of the fine-tuning data required, the performance of GPICURE was close to that of training from scratch with 100% of the fine-tuning data. In addition, consistent visualization with downstream tasks and a 57% reduction in weight disparity demonstrated a better fine-tuning starting point. The project page is hosted at https://gpicture-page.github.io/.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] PointVST: Self-Supervised Pre-Training for 3D Point Clouds via View-Specific Point-to-Image Translation
    Zhang, Qijian
    Hou, Junhui
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (10) : 6900 - 6912
  • [42] S3T: SELF-SUPERVISED PRE-TRAINING WITH SWIN TRANSFORMER FOR MUSIC CLASSIFICATION
    Zhao, Hang
    Zhang, Chen
    Zhu, Bilei
    Ma, Zejun
    Zhang, Kejun
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 606 - 610
  • [43] Self-supervised Pre-training with Learnable Tokenizers for Person Re-Identification in Railway Stations
    Yang, Enze
    Li, Chao
    Liu, Shuoyan
    Liu, Yuxin
    Zhao, Shitao
    Huang, Nan
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 325 - 330
  • [44] SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech
    Lin, Jingru
    Ge, Meng
    Ao, Junyi
    Deng, Liqun
    Li, Haizhou
    INTERSPEECH 2024, 2024, : 597 - 601
  • [45] Self-Supervised Global Spatio-Temporal Interaction Pre-Training for Group Activity Recognition
    Du, Zexing
    Wang, Xue
    Wang, Qing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5076 - 5088
  • [46] WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
    Chen, Sanyuan
    Wang, Chengyi
    Chen, Zhengyang
    Wu, Yu
    Liu, Shujie
    Chen, Zhuo
    Li, Jinyu
    Kanda, Naoyuki
    Yoshioka, Takuya
    Xiao, Xiong
    Wu, Jian
    Zhou, Long
    Ren, Shuo
    Qian, Yanmin
    Qian, Yao
    Zeng, Michael
    Yu, Xiangzhan
    Wei, Furu
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1505 - 1518
  • [47] Self-supervised learning for point cloud data: A survey
    Zeng, Changyu
    Wang, Wei
    Nguyen, Anh
    Xiao, Jimin
    Yue, Yutao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [48] Self-Supervised Point Cloud Prediction for Autonomous Driving
    Du, Ronghua
    Feng, Rongying
    Gao, Kai
    Zhang, Jinlai
    Liu, Linhong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 17452 - 17467
  • [49] SDCluster: A clustering based self-supervised pre-training method for semantic segmentation of remote sensing images
    Xu, Hanwen
    Zhang, Chenxiao
    Yue, Peng
    Wang, Kaixuan
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2025, 223 : 1 - 14
  • [50] Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks
    Ma, Duo
    Yue, Xianghu
    Ao, Junyi
    Gao, Xiaoxue
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2055 - 2059