Image Understands Point Cloud: Weakly Supervised 3D Semantic Segmentation via Association Learning

被引:7
作者
Sun, Tianfang [1 ]
Zhang, Zhizhong [1 ]
Tan, Xin [1 ,2 ]
Qu, Yanyun [3 ]
Xie, Yuan [1 ,2 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200060, Peoples R China
[2] East China Normal Univ, Chongqing Inst, Chongqing 401333, Peoples R China
[3] Xiamen Univ, Sch Informat, Dept Comp Sci & Technol, Xiamen 361005, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Point cloud compression; Three-dimensional displays; Labeling; Laser radar; Annotations; Training; Semantic segmentation; Multimodal; weakly supervised; point cloud semantic segmentation;
D O I
10.1109/TIP.2024.3372449
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised point cloud semantic segmentation methods that require 1% or fewer labels with the aim of realizing almost the same performance as fully supervised approaches have recently attracted extensive research attention. A typical solution in this framework is to use self-training or pseudo-labeling to mine the supervision from the point cloud itself while ignoring the critical information from images. In fact, cameras widely exist in LiDAR scenarios, and this complementary information seems to be highly important for 3D applications. In this paper, we propose a novel cross-modality weakly supervised method for 3D segmentation that incorporates complementary information from unlabeled images. We design a dual-branch network equipped with an active labeling strategy to maximize the power of tiny parts of labels and to directly realize 2D-to-3D knowledge transfer. Afterward, we establish a cross-modal self-training framework, which iterates between parameter updating and pseudolabel estimation. In the training phase, we propose cross-modal association learning to mine complementary supervision from images by reinforcing the cycle consistency between 3D points and 2D superpixels. In the pseudolabel estimation phase, a pseudolabel self-rectification mechanism is derived to filter noisy labels, thus providing more accurate labels for the networks to be fully trained. The extensive experimental results demonstrate that our method even outperforms the state-of-the-art fully supervised competitors with less than 1% actively selected annotations.
引用
收藏
页码:1838 / 1852
页数:15
相关论文
共 59 条
  • [1] SLIC Superpixels Compared to State-of-the-Art Superpixel Methods
    Achanta, Radhakrishna
    Shaji, Appu
    Smith, Kevin
    Lucchi, Aurelien
    Fua, Pascal
    Suesstrunk, Sabine
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) : 2274 - 2281
  • [2] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
    Behley, Jens
    Garbade, Martin
    Milioto, Andres
    Quenzel, Jan
    Behnke, Sven
    Stachniss, Cyrill
    Gall, Juergen
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9296 - 9306
  • [3] The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
    Berman, Maxim
    Triki, Amal Rannen
    Blaschko, Matthew B.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4413 - 4421
  • [4] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [5] Revisiting Superpixels for Active Learning in Semantic Segmentation with Realistic Annotation Costs
    Cai, Lile
    Xu, Xun
    Liew, Jun Hao
    Foo, Chuan Sheng
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10983 - 10992
  • [6] Campello Ricardo J. G. B., 2013, Advances in Knowledge Discovery and Data Mining. 17th Pacific-Asia Conference (PAKDD 2013). Proceedings, P160, DOI 10.1007/978-3-642-37456-2_14
  • [7] Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, DOI 10.48550/ARXIV.1706.05587]
  • [8] Chen YJ, 2024, Arxiv, DOI arXiv:2312.08234
  • [9] Cortinhal Tiago, 2020, Advances in Visual Computing. 15th International Symposium, ISVC 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12510), P207, DOI 10.1007/978-3-030-64559-5_16
  • [10] El Madawi K, 2019, IEEE INT C INTELL TR, P7, DOI [10.1109/itsc.2019.8917447, 10.1109/ITSC.2019.8917447]