Point-LGMask: Local and Global Contexts Embedding for Point Cloud Pre-Training With Multi-Ratio Masking

被引：8

作者：

Tang, Yuan ^{[1
]}

Li, Xianzhi ^{[1
]}

Xu, Jinfeng ^{[1
]}

Yu, Qiao ^{[1
]}

Hu, Long ^{[1
]}

Hao, Yixue ^{[1
]}

Chen, Min ^{[2
,3
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China

[2] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510640, Peoples R China

[3] Pazhou Lab, Guangzhou 510640, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Point cloud compression; Task analysis; Three-dimensional displays; Predictive models; Self-supervised learning; Representation learning; Context modeling; Local and global contexts embedding; self-supervised learning; point cloud understanding; representation learning;

D O I：

10.1109/TMM.2023.3282568

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Self-supervised learning has achieved great success in both natural language processing and 2D vision, where masked modeling is a quite popular pre-training scheme. However, extending masking to 3D point cloud understanding that combines local and global features poses a new challenge. In our work, we present Point-LGMask, a novel method to embed both local and global contexts with multi-ratio masking, which is quite effective for self-supervised feature learning of point clouds but is unfortunately ignored by existing pre-training works. Specifically, to avoid fitting to a fixed masking ratio, we first propose multi-ratio masking, which prompts the encoder to fully explore representative features thanks to tasks of different difficulties. Next, to encourage the embedding of both local and global features, we formulate a compound loss, which consists of (i) a global representation contrastive loss to encourage the cluster assignments of the masked point clouds to be consistent to that of the completed input, and (ii) a local point cloud prediction loss to encourage accurate prediction of masked points. Equipped with our Point-LGMask, we show that our learned representations transfer well to various downstream tasks, including few-shot classification, shape classification, object part segmentation, as well as real-world scene-based 3D object detection and 3D semantic segmentation. Particularly, our model largely advances existing pre-training methods on the difficult few-shot classification task using the real-captured ScanObjectNN dataset by surpassing over 4% to the second-best method. Also, our Point-LGMask achieves 0.4% AP(25) and 0.8% AP(50) gains on 3D object detection task over the second-best method. For semantic segmentation, our Point-LGMask surpasses the second-best method by 0.4% mAcc and 0.5% mIoU.

引用

页码：8360 / 8370

页数：11

共 51 条

[1] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding [J].

Afham, Mohamed ;

Dissanayake, Isuru ;

Dissanayake, Dinithi ;

Dharmasiri, Amaya ;

Thilakarathna, Kanchana ;

Rodrigo, Ranga .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :9892-9902

[2] 3D Semantic Parsing of Large-Scale Indoor Spaces [J].

Armeni, Iro ;

Sener, Ozan ;

Zamir, Amir R. ;

Jiang, Helen ;

Brilakis, Ioannis ;

Fischer, Martin ;

Savarese, Silvio .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1534-1543

[3] Masked Siamese Networks for Label-Efficient Learning [J].

Assran, Mahmoud ;

Caron, Mathilde ;

Misra, Ishan ;

Bojanowski, Piotr ;

Bordes, Florian ;

Vincent, Pascal ;

Joulin, Armand ;

Rabbat, Mike ;

Ballas, Nicolas .

COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 :456-473

[4]

Bao H., 2022, P INT C LEARN REPR

[5] Learning a Structured Latent Space for Unsupervised Point Cloud Completion [J].

Cai, Yingjie ;

Lin, Kwan-Yee ;

Zhang, Chao ;

Wang, Qiang ;

Wang, Xiaogang ;

Li, Hongsheng .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5533-5543

[6]

Chen T, 2020, PR MACH LEARN RES, V119

[7] Exploring Simple Siamese Representation Learning [J].

Chen, Xinlei ;

He, Kaiming .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753

[8] An Empirical Study of Training Self-Supervised Vision Transformers [J].

Chen, Xinlei ;

Xie, Saining ;

He, Kaiming .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9620-9629

[9] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].

Dai, Angela ;

Chang, Angel X. ;

Savva, Manolis ;

Halber, Maciej ;

Funkhouser, Thomas ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443

[10]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 5 6 →