Point-LGMask: Local and Global Contexts Embedding for Point Cloud Pre-Training With Multi-Ratio Masking

被引:6
作者
Tang, Yuan [1 ]
Li, Xianzhi [1 ]
Xu, Jinfeng [1 ]
Yu, Qiao [1 ]
Hu, Long [1 ]
Hao, Yixue [1 ]
Chen, Min [2 ,3 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[2] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510640, Peoples R China
[3] Pazhou Lab, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Point cloud compression; Task analysis; Three-dimensional displays; Predictive models; Self-supervised learning; Representation learning; Context modeling; Local and global contexts embedding; self-supervised learning; point cloud understanding; representation learning;
D O I
10.1109/TMM.2023.3282568
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Self-supervised learning has achieved great success in both natural language processing and 2D vision, where masked modeling is a quite popular pre-training scheme. However, extending masking to 3D point cloud understanding that combines local and global features poses a new challenge. In our work, we present Point-LGMask, a novel method to embed both local and global contexts with multi-ratio masking, which is quite effective for self-supervised feature learning of point clouds but is unfortunately ignored by existing pre-training works. Specifically, to avoid fitting to a fixed masking ratio, we first propose multi-ratio masking, which prompts the encoder to fully explore representative features thanks to tasks of different difficulties. Next, to encourage the embedding of both local and global features, we formulate a compound loss, which consists of (i) a global representation contrastive loss to encourage the cluster assignments of the masked point clouds to be consistent to that of the completed input, and (ii) a local point cloud prediction loss to encourage accurate prediction of masked points. Equipped with our Point-LGMask, we show that our learned representations transfer well to various downstream tasks, including few-shot classification, shape classification, object part segmentation, as well as real-world scene-based 3D object detection and 3D semantic segmentation. Particularly, our model largely advances existing pre-training methods on the difficult few-shot classification task using the real-captured ScanObjectNN dataset by surpassing over 4% to the second-best method. Also, our Point-LGMask achieves 0.4% AP(25) and 0.8% AP(50) gains on 3D object detection task over the second-best method. For semantic segmentation, our Point-LGMask surpasses the second-best method by 0.4% mAcc and 0.5% mIoU.
引用
收藏
页码:8360 / 8370
页数:11
相关论文
共 51 条
  • [1] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
    Afham, Mohamed
    Dissanayake, Isuru
    Dissanayake, Dinithi
    Dharmasiri, Amaya
    Thilakarathna, Kanchana
    Rodrigo, Ranga
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902
  • [2] 3D Semantic Parsing of Large-Scale Indoor Spaces
    Armeni, Iro
    Sener, Ozan
    Zamir, Amir R.
    Jiang, Helen
    Brilakis, Ioannis
    Fischer, Martin
    Savarese, Silvio
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1534 - 1543
  • [3] Masked Siamese Networks for Label-Efficient Learning
    Assran, Mahmoud
    Caron, Mathilde
    Misra, Ishan
    Bojanowski, Piotr
    Bordes, Florian
    Vincent, Pascal
    Joulin, Armand
    Rabbat, Mike
    Ballas, Nicolas
    [J]. COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 456 - 473
  • [4] Bao H., 2022, P INT C LEARN REPR
  • [5] Learning a Structured Latent Space for Unsupervised Point Cloud Completion
    Cai, Yingjie
    Lin, Kwan-Yee
    Zhang, Chao
    Wang, Qiang
    Wang, Xiaogang
    Li, Hongsheng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5533 - 5543
  • [6] Chen T, 2020, PR MACH LEARN RES, V119
  • [7] Exploring Simple Siamese Representation Learning
    Chen, Xinlei
    He, Kaiming
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15745 - 15753
  • [8] An Empirical Study of Training Self-Supervised Vision Transformers
    Chen, Xinlei
    Xie, Saining
    He, Kaiming
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9620 - 9629
  • [9] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
    Dai, Angela
    Chang, Angel X.
    Savva, Manolis
    Halber, Maciej
    Funkhouser, Thomas
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2432 - 2443
  • [10] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171