Point-LGMask: Local and Global Contexts Embedding for Point Cloud Pre-Training With Multi-Ratio Masking

被引:6
作者
Tang, Yuan [1 ]
Li, Xianzhi [1 ]
Xu, Jinfeng [1 ]
Yu, Qiao [1 ]
Hu, Long [1 ]
Hao, Yixue [1 ]
Chen, Min [2 ,3 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[2] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510640, Peoples R China
[3] Pazhou Lab, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Point cloud compression; Task analysis; Three-dimensional displays; Predictive models; Self-supervised learning; Representation learning; Context modeling; Local and global contexts embedding; self-supervised learning; point cloud understanding; representation learning;
D O I
10.1109/TMM.2023.3282568
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Self-supervised learning has achieved great success in both natural language processing and 2D vision, where masked modeling is a quite popular pre-training scheme. However, extending masking to 3D point cloud understanding that combines local and global features poses a new challenge. In our work, we present Point-LGMask, a novel method to embed both local and global contexts with multi-ratio masking, which is quite effective for self-supervised feature learning of point clouds but is unfortunately ignored by existing pre-training works. Specifically, to avoid fitting to a fixed masking ratio, we first propose multi-ratio masking, which prompts the encoder to fully explore representative features thanks to tasks of different difficulties. Next, to encourage the embedding of both local and global features, we formulate a compound loss, which consists of (i) a global representation contrastive loss to encourage the cluster assignments of the masked point clouds to be consistent to that of the completed input, and (ii) a local point cloud prediction loss to encourage accurate prediction of masked points. Equipped with our Point-LGMask, we show that our learned representations transfer well to various downstream tasks, including few-shot classification, shape classification, object part segmentation, as well as real-world scene-based 3D object detection and 3D semantic segmentation. Particularly, our model largely advances existing pre-training methods on the difficult few-shot classification task using the real-captured ScanObjectNN dataset by surpassing over 4% to the second-best method. Also, our Point-LGMask achieves 0.4% AP(25) and 0.8% AP(50) gains on 3D object detection task over the second-best method. For semantic segmentation, our Point-LGMask surpasses the second-best method by 0.4% mAcc and 0.5% mIoU.
引用
收藏
页码:8360 / 8370
页数:11
相关论文
共 51 条
  • [31] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
    Qi, Charles R.
    Su, Hao
    Mo, Kaichun
    Guibas, Leonidas J.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 77 - 85
  • [32] Radford A, 2021, PR MACH LEARN RES, V139
  • [33] RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
    Rao, Yongming
    Liu, Benlin
    Wei, Yi
    Lu, Jiwen
    Hsieh, Cho-Jui
    Zhou, Jie
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3263 - 3272
  • [34] Saining Xie, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12348), P574, DOI 10.1007/978-3-030-58580-8_34
  • [35] CSGNet: Neural Shape Parser for Constructive Solid Geometry
    Sharma, Gopal
    Goyal, Rishabh
    Liu, Difan
    Kalogerakis, Evangelos
    Maji, Subhransu
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5515 - 5523
  • [36] Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data
    Uy, Mikaela Angelina
    Quang-Hieu Pham
    Binh-Son Hua
    Duc Thanh Nguyen
    Yeung, Sai-Kit
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1588 - 1597
  • [37] Vaswani A, 2017, ADV NEUR IN, V30
  • [38] Unsupervised Point Cloud Pre-training via Occlusion Completion
    Wang, Hanchen
    Liu, Qi
    Yue, Xiangyu
    Lasenby, Joan
    Kusner, Matt J.
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9762 - 9772
  • [39] Dynamic Graph CNN for Learning on Point Clouds
    Wang, Yue
    Sun, Yongbin
    Liu, Ziwei
    Sarma, Sanjay E.
    Bronstein, Michael M.
    Solomon, Justin M.
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (05):
  • [40] Service-Oriented Feature-Based Data Exchange for Cloud-Based Design and Manufacturing
    Wu, Yiqi
    He, Fazhi
    Zhang, Dejun
    Li, Xiaoxia
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2018, 11 (02) : 341 - 353