Mutual information-driven self-supervised point cloud pre-training

被引:0
作者
Xu, Weichen [1 ]
Fu, Tianhao [1 ]
Cao, Jian [1 ]
Zhao, Xinyu [1 ]
Xu, Xinxin [1 ]
Cao, Xixin [1 ]
Zhang, Xing [1 ,2 ]
机构
[1] Peking Univ, Sch Software & Microelect, Beijing 100871, Peoples R China
[2] Peking Univ, Shenzhen Grad Sch, Key Lab Integrated Microsyst, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised learning; Autonomous driving; Point cloud scene understanding; Mutual information; High-level features; OPTIMIZATION;
D O I
10.1016/j.knosys.2024.112741
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning universal representations from unlabeled 3D point clouds is essential to improve the generalization and safety of autonomous driving. Generative self-supervised point cloud pre-training with low-level features as pretext tasks is a mainstream paradigm. However, from the perspective of mutual information, this approach is constrained by spatial information and entangled representations. In this study, we propose a generalized generative self-supervised point cloud pre-training framework called GPICTURE. High-level features were used as an additional pretext task to enhance the understanding of semantic information. Considering the varying difficulties caused by the discrimination of voxel features, we designed inter-class and intra-class discrimination-guided masking (I2Mask) to set the masking ratio adaptively. Furthermore, to ensure a hierarchical and stable reconstruction process, centered kernel alignment-guided hierarchical reconstruction and differential-gated progressive learning were employed to control multiple reconstruction tasks. Complete theoretical analyses demonstrated that high-level features can enhance the mutual information between latent features and high-level features, as well as the input point cloud. On Waymo, nuScenes, and SemanticKITTI, we achieved a 75.55% mAP for 3D object detection, 79.7% mIoU for 3D semantic segmentation, and 18.8% mIoU for occupancy prediction. Specifically, with only 50% of the fine-tuning data required, the performance of GPICURE was close to that of training from scratch with 100% of the fine-tuning data. In addition, consistent visualization with downstream tasks and a 57% reduction in weight disparity demonstrated a better fine-tuning starting point. The project page is hosted at https://gpicture-page.github.io/.
引用
收藏
页数:16
相关论文
共 105 条
[1]   SLIC Superpixels Compared to State-of-the-Art Superpixel Methods [J].
Achanta, Radhakrishna ;
Shaji, Appu ;
Smith, Kevin ;
Lucchi, Aurelien ;
Fua, Pascal ;
Suesstrunk, Sabine .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2274-2281
[2]   CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding [J].
Afham, Mohamed ;
Dissanayake, Isuru ;
Dissanayake, Dinithi ;
Dharmasiri, Amaya ;
Thilakarathna, Kanchana ;
Rodrigo, Ranga .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9892-9902
[3]   Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture [J].
Assran, Mahmoud ;
Duval, Quentin ;
Misra, Ishan ;
Bojanowski, Piotr ;
Vincent, Pascal ;
Rabbat, Michael ;
Lecun, Yann ;
Ballas, Nicolas .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :15619-15629
[4]  
Bachman P, 2019, ADV NEUR IN, V32
[5]   SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].
Behley, Jens ;
Garbade, Martin ;
Milioto, Andres ;
Quenzel, Jan ;
Behnke, Sven ;
Stachniss, Cyrill ;
Gall, Juergen .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306
[6]   Variational Inference: A Review for Statisticians [J].
Blei, David M. ;
Kucukelbir, Alp ;
McAuliffe, Jon D. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) :859-877
[7]   ALSO: Automotive Lidar Self-supervision by Occupancy estimation [J].
Boulch, Alexandre ;
Sautier, Corentin ;
Michele, Bjorn ;
Puy, Gilles ;
Marlet, Renaud .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :13455-13465
[8]  
Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[9]   CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP [J].
Chen, Runnan ;
Liu, Youquan ;
Kong, Lingdong ;
Zhu, Xinge ;
Ma, Yuexin ;
Li, Yikang ;
Hou, Yuenan ;
Qiao, Yu ;
Wang, Wenping .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :7020-7030
[10]   Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds [J].
Cheng, Bowen ;
Sheng, Lu ;
Shi, Shaoshuai ;
Yang, Ming ;
Xu, Dong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8959-8968