Retro-FPN: Retrospective Feature Pyramid Network for Point Cloud Semantic Segmentation

被引:7
作者
Xiang, Peng [1 ]
Wen, Xin [2 ]
Liu, Yu-Shen [1 ]
Zhang, Hui [1 ]
Fang, Yi [3 ]
Han, Zhizhong [4 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing, Peoples R China
[2] JD Com, Beijing, Peoples R China
[3] New York Univ Abu Dhabi, Abu Dhabi, U Arab Emirates
[4] Wayne State Univ, Detroit, MI USA
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.01634
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning per-point semantic features from the hierarchical feature pyramid is essential for point cloud semantic segmentation. However, most previous methods suffered from ambiguous region features or failed to refine per-point features effectively, which leads to information loss and ambiguous semantic identification. To resolve this, we propose Retro-FPN to model the per-point feature prediction as an explicit and retrospective refining process, which goes through all the pyramid layers to extract semantic features explicitly for each point. Its key novelty is a retro-transformer for summarizing semantic contexts from the previous layer and accordingly refining the features in the current stage. In this way, the categorization of each point is conditioned on its local semantic pattern. Specifically, the retro-transformer consists of a local cross-attention block and a semantic gate unit. The cross-attention serves to summarize the semantic pattern retrospectively from the previous layer. And the gate unit carefully incorporates the summarized contexts and refines the current semantic features. Retro-FPN is a pluggable neural network that applies to hierarchical decoders. By integrating Retro-FPN with three representative backbones, including both point-based and voxel-based methods, we show that Retro-FPN can significantly improve performance over state-of-the-art backbones. Comprehensive experiments on widely used benchmarks can justify the effectiveness of our design. The source is available at https://github.com/AllenXiangX/Retro-FPN.
引用
收藏
页码:17780 / 17792
页数:13
相关论文
共 90 条
  • [1] 3D Semantic Parsing of Large-Scale Indoor Spaces
    Armeni, Iro
    Sener, Ozan
    Zamir, Amir R.
    Jiang, Helen
    Brilakis, Ioannis
    Fischer, Martin
    Savarese, Silvio
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1534 - 1543
  • [2] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
    Behley, Jens
    Garbade, Martin
    Milioto, Andres
    Quenzel, Jan
    Behnke, Sven
    Stachniss, Cyrill
    Gall, Juergen
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9296 - 9306
  • [3] Chen C., 2023, P IEEE CVF C COMP VI
  • [4] Chenfeng Xu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12373), P1, DOI 10.1007/978-3-030-58604-1_1
  • [5] (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network
    Cheng, Ran
    Razani, Ryan
    Taghavi, Ehsan
    Li, Enxu
    Liu, Bingbing
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12542 - 12551
  • [6] A Unified Point-Based Framework for 3D Segmentation
    Chiang, Hung-Yueh
    Lin, Yen-Liang
    Liu, Yueh-Cheng
    Hsu, Winston H.
    [J]. 2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 155 - 163
  • [7] PointMixer: MLP-Mixer for Point Cloud Understanding
    Choe, Jaesung
    Park, Chunghyun
    Rameau, Francois
    Park, Jaesik
    Kweon, In So
    [J]. COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 620 - 640
  • [8] Choy Christopher, 2019, P IEEE CVF C COMP VI
  • [9] Chung J., 2014, ARXIV
  • [10] Cortinhal T., 2020, ISVC