Image Understands Point Cloud: Weakly Supervised 3D Semantic Segmentation via Association Learning

被引:7
作者
Sun, Tianfang [1 ]
Zhang, Zhizhong [1 ]
Tan, Xin [1 ,2 ]
Qu, Yanyun [3 ]
Xie, Yuan [1 ,2 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200060, Peoples R China
[2] East China Normal Univ, Chongqing Inst, Chongqing 401333, Peoples R China
[3] Xiamen Univ, Sch Informat, Dept Comp Sci & Technol, Xiamen 361005, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Point cloud compression; Three-dimensional displays; Labeling; Laser radar; Annotations; Training; Semantic segmentation; Multimodal; weakly supervised; point cloud semantic segmentation;
D O I
10.1109/TIP.2024.3372449
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised point cloud semantic segmentation methods that require 1% or fewer labels with the aim of realizing almost the same performance as fully supervised approaches have recently attracted extensive research attention. A typical solution in this framework is to use self-training or pseudo-labeling to mine the supervision from the point cloud itself while ignoring the critical information from images. In fact, cameras widely exist in LiDAR scenarios, and this complementary information seems to be highly important for 3D applications. In this paper, we propose a novel cross-modality weakly supervised method for 3D segmentation that incorporates complementary information from unlabeled images. We design a dual-branch network equipped with an active labeling strategy to maximize the power of tiny parts of labels and to directly realize 2D-to-3D knowledge transfer. Afterward, we establish a cross-modal self-training framework, which iterates between parameter updating and pseudolabel estimation. In the training phase, we propose cross-modal association learning to mine complementary supervision from images by reinforcing the cycle consistency between 3D points and 2D superpixels. In the pseudolabel estimation phase, a pseudolabel self-rectification mechanism is derived to filter noisy labels, thus providing more accurate labels for the networks to be fully trained. The extensive experimental results demonstrate that our method even outperforms the state-of-the-art fully supervised competitors with less than 1% actively selected annotations.
引用
收藏
页码:1838 / 1852
页数:15
相关论文
共 59 条
  • [31] Milioto A, 2019, IEEE INT C INT ROBOT, P4213, DOI 10.1109/IROS40897.2019.8967762
  • [32] EfficientPS: Efficient Panoptic Segmentation
    Mohan, Rohit
    Valada, Abhinav
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (05) : 1551 - 1579
  • [33] Nesterov Yu. E., 1983, Doklady Akademii Nauk SSSR, V269, P543
  • [34] In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images
    Orsic, Marin
    Kreso, Ivan
    Bevandic, Petra
    Segvic, Sinisa
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12599 - 12608
  • [35] Paszke A, 2019, ADV NEUR IN, V32
  • [36] Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation
    Peng, Duo
    Lei, Yinjie
    Li, Wen
    Zhang, Pingping
    Guo, Yulan
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7088 - 7097
  • [37] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
    Qi, Charles R.
    Su, Hao
    Mo, Kaichun
    Guibas, Leonidas J.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 77 - 85
  • [38] Qingyong Hu, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P11105, DOI 10.1109/CVPR42600.2020.01112
  • [39] Qi CR, 2017, Arxiv, DOI [arXiv:1706.02413, DOI 10.48550/ARXIV.1706.02413]
  • [40] Saporta A, 2020, Arxiv, DOI arXiv:2006.08658