Mutual information-driven self-supervised point cloud pre-training

被引：0

作者：

Xu, Weichen ^{[1
]}

Fu, Tianhao ^{[1
]}

Cao, Jian ^{[1
]}

Zhao, Xinyu ^{[1
]}

Xu, Xinxin ^{[1
]}

Cao, Xixin ^{[1
]}

Zhang, Xing ^{[1
,2
]}

机构：

[1] Peking Univ, Sch Software & Microelect, Beijing 100871, Peoples R China

[2] Peking Univ, Shenzhen Grad Sch, Key Lab Integrated Microsyst, Shenzhen 518055, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 307卷

基金：

中国国家自然科学基金;

关键词：

Self-supervised learning; Autonomous driving; Point cloud scene understanding; Mutual information; High-level features; OPTIMIZATION;

D O I：

10.1016/j.knosys.2024.112741

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning universal representations from unlabeled 3D point clouds is essential to improve the generalization and safety of autonomous driving. Generative self-supervised point cloud pre-training with low-level features as pretext tasks is a mainstream paradigm. However, from the perspective of mutual information, this approach is constrained by spatial information and entangled representations. In this study, we propose a generalized generative self-supervised point cloud pre-training framework called GPICTURE. High-level features were used as an additional pretext task to enhance the understanding of semantic information. Considering the varying difficulties caused by the discrimination of voxel features, we designed inter-class and intra-class discrimination-guided masking (I2Mask) to set the masking ratio adaptively. Furthermore, to ensure a hierarchical and stable reconstruction process, centered kernel alignment-guided hierarchical reconstruction and differential-gated progressive learning were employed to control multiple reconstruction tasks. Complete theoretical analyses demonstrated that high-level features can enhance the mutual information between latent features and high-level features, as well as the input point cloud. On Waymo, nuScenes, and SemanticKITTI, we achieved a 75.55% mAP for 3D object detection, 79.7% mIoU for 3D semantic segmentation, and 18.8% mIoU for occupancy prediction. Specifically, with only 50% of the fine-tuning data required, the performance of GPICURE was close to that of training from scratch with 100% of the fine-tuning data. In addition, consistent visualization with downstream tasks and a 57% reduction in weight disparity demonstrated a better fine-tuning starting point. The project page is hosted at https://gpicture-page.github.io/.

引用

页数：16

共 50 条

[41] PointVST: Self-Supervised Pre-Training for 3D Point Clouds via View-Specific Point-to-Image Translation
Zhang, Qijian
Hou, Junhui
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (10) : 6900 - 6912
[42] S3T: SELF-SUPERVISED PRE-TRAINING WITH SWIN TRANSFORMER FOR MUSIC CLASSIFICATION
Zhao, Hang
Zhang, Chen
Zhu, Bilei
Ma, Zejun
Zhang, Kejun
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 606 - 610
[43] Self-supervised Pre-training with Learnable Tokenizers for Person Re-Identification in Railway Stations
Yang, Enze
Li, Chao
Liu, Shuoyan
Liu, Yuxin
Zhao, Shitao
Huang, Nan
2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 325 - 330
[44] SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech
Lin, Jingru
Ge, Meng
Ao, Junyi
Deng, Liqun
Li, Haizhou
INTERSPEECH 2024, 2024, : 597 - 601
[45] Self-Supervised Global Spatio-Temporal Interaction Pre-Training for Group Activity Recognition
Du, Zexing
Wang, Xue
Wang, Qing
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5076 - 5088
[46] WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Chen, Sanyuan
Wang, Chengyi
Chen, Zhengyang
Wu, Yu
Liu, Shujie
Chen, Zhuo
Li, Jinyu
Kanda, Naoyuki
Yoshioka, Takuya
Xiao, Xiong
Wu, Jian
Zhou, Long
Ren, Shuo
Qian, Yanmin
Qian, Yao
Zeng, Michael
Yu, Xiangzhan
Wei, Furu
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1505 - 1518
[47] Self-supervised learning for point cloud data: A survey
Zeng, Changyu
Wang, Wei
Nguyen, Anh
Xiao, Jimin
Yue, Yutao
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
[48] Self-Supervised Point Cloud Prediction for Autonomous Driving
Du, Ronghua
Feng, Rongying
Gao, Kai
Zhang, Jinlai
Liu, Linhong
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 17452 - 17467
[49] SDCluster: A clustering based self-supervised pre-training method for semantic segmentation of remote sensing images
Xu, Hanwen
Zhang, Chenxiao
Yue, Peng
Wang, Kaixuan
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2025, 223 : 1 - 14
[50] Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks
Ma, Duo
Yue, Xianghu
Ao, Junyi
Gao, Xiaoxue
Li, Haizhou
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2055 - 2059

← 1 2 3 4 5 →