Curriculumformer: Taming Curriculum Pre-Training for Enhanced 3-D Point Cloud Understanding

被引：0

作者：

Fei, Ben ^{[1
]}

Luo, Tianyue ^{[1
]}

Yang, Weidong ^{[1
]}

Liu, Liwen ^{[1
]}

Zhang, Rui ^{[1
]}

He, Ying ^{[2
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China

[2] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore 639798, Singapore

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年

基金：

中国国家自然科学基金;

关键词：

Point cloud compression; Transformers; Task analysis; Representation learning; Geometry; Data models; Accuracy; 3-D representation learning; curriculum learning; point clouds; self-supervised learning; transformer;

D O I：

10.1109/TNNLS.2024.3406587

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning universal representations of 3-D point clouds is essential for reducing the need for manual annotation of large-scale and irregular point cloud datasets. The current modus operandi for representative learning is self-supervised learning, which has shown great potential for improving point cloud understanding. Nevertheless, it remains an open problem how to employ auto-encoding for learning universal 3-D representations of irregularly structured point clouds, as previous methods focus on either global shapes or local geometries. To this end, we present a cascaded self-supervised point cloud representation learning framework, dubbed Curriculumformer, aiming to tame curriculum pre-training for enhanced point cloud understanding. Our main idea lies in devising a progressive pre-training strategy, which trains the Transformer in an easy-to-hard manner. Specifically, we first pre-train the Transformer using an upsampling strategy, which allows it to learn global information. Then, we follow up with a completion strategy, which enables the Transformer to gain insight into local geometries. Finally, we propose a Multi-Modal Multi-Modality Contrastive Learning (M4CL) strategy to enhance the ability of representation learning by enriching the Transformer with semantic information. In this way, the pre-trained Transformer can be easily transferred to a wide range of downstream applications. We demonstrate the superior performance of Curriculumformer on various discriminant and generative tasks, outperforming state-of-the-art methods. Moreover, Curriculumformer can also be integrated into other off-the-shelf methods to promote their performance. Our code is available at https://github.com/Fayeben/Curriculumformer.

引用

页码：1 / 15

页数：15

共 50 条

[1] Point-LGMask: Local and Global Contexts Embedding for Point Cloud Pre-Training With Multi-Ratio Masking
Tang, Yuan
Li, Xianzhi
Xu, Jinfeng
Yu, Qiao
Hu, Long
Hao, Yixue
Chen, Min
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8360 - 8370
[2] Self-Supervised Pre-Training for 3-D Roof Reconstruction on LiDAR Data
Yang, Hongxin
Huang, Shangfeng
Wang, Ruisheng
Wang, Xin
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[3] Learnable Query Contrast and Spatio-temporal Prediction on Point Cloud Video Pre-training
Sheng, Xiaoxiao
Shen, Zhiqiang
Wang, Longguang
Xiao, Gang
IEEE LATIN AMERICA TRANSACTIONS, 2024, 22 (10) : 821 - 828
[4] Schema dependency-enhanced curriculum pre-training for table semantic parsing
Qin, Bowen
Hui, Binyuan
Wang, Lihan
Yang, Min
Li, Binhua
Huang, Fei
Si, Luo
Jiang, Qingshan
Li, Yongbin
KNOWLEDGE-BASED SYSTEMS, 2023, 262
[5] Self-Training Enhanced Multitask Network for 3-D Point-Level Hybrid Scene Understanding for Autonomous Vehicles
Li, Bing-He
Lu, Ching-Hu
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (19): : 31394 - 31406
[6] Unsupervised Pre-Training for 3D Leaf Instance Segmentation
Roggiolani, Gianmarco
Magistri, Federico
Guadagnino, Tiziano
Behley, Jens
Stachniss, Cyrill
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (11) : 7448 - 7455
[7] Rotation-Invariant Point Cloud Representation for 3-D Model Recognition
Wang, Yan
Zhao, Yining
Ying, Shihui
Du, Shaoyi
Gao, Yue
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) : 10948 - 10956
[8] Mutual information-driven self-supervised point cloud pre-training
Xu, Weichen
Fu, Tianhao
Cao, Jian
Zhao, Xinyu
Xu, Xinxin
Cao, Xixin
Zhang, Xing
KNOWLEDGE-BASED SYSTEMS, 2025, 307
[9] 3DGTN: 3-D Dual-Attention GLocal Transformer Network for Point Cloud Classification and Segmentation
Lu, Dening
Gao, Kyle
Xie, Qian
Xu, Linlin
Li, Jonathan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13
[10] LSLPCT: An Enhanced Local Semantic Learning Transformer for 3-D Point Cloud Analysis
Song, Yupeng
He, Fazhi
Duan, Yansong
Si, Tongzhen
Bai, Junwei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

← 1 2 3 4 5 →