Retro-FPN: Retrospective Feature Pyramid Network for Point Cloud Semantic Segmentation

被引：7

作者：

Xiang, Peng ^{[1
]}

Wen, Xin ^{[2
]}

Liu, Yu-Shen ^{[1
]}

Zhang, Hui ^{[1
]}

Fang, Yi ^{[3
]}

Han, Zhizhong ^{[4
]}

机构：

[1] Tsinghua Univ, Sch Software, Beijing, Peoples R China

[2] JD Com, Beijing, Peoples R China

[3] New York Univ Abu Dhabi, Abu Dhabi, U Arab Emirates

[4] Wayne State Univ, Detroit, MI USA

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV51070.2023.01634

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning per-point semantic features from the hierarchical feature pyramid is essential for point cloud semantic segmentation. However, most previous methods suffered from ambiguous region features or failed to refine per-point features effectively, which leads to information loss and ambiguous semantic identification. To resolve this, we propose Retro-FPN to model the per-point feature prediction as an explicit and retrospective refining process, which goes through all the pyramid layers to extract semantic features explicitly for each point. Its key novelty is a retro-transformer for summarizing semantic contexts from the previous layer and accordingly refining the features in the current stage. In this way, the categorization of each point is conditioned on its local semantic pattern. Specifically, the retro-transformer consists of a local cross-attention block and a semantic gate unit. The cross-attention serves to summarize the semantic pattern retrospectively from the previous layer. And the gate unit carefully incorporates the summarized contexts and refines the current semantic features. Retro-FPN is a pluggable neural network that applies to hierarchical decoders. By integrating Retro-FPN with three representative backbones, including both point-based and voxel-based methods, we show that Retro-FPN can significantly improve performance over state-of-the-art backbones. Comprehensive experiments on widely used benchmarks can justify the effectiveness of our design. The source is available at https://github.com/AllenXiangX/Retro-FPN.

引用

页码：17780 / 17792

页数：13

共 90 条

[1] 3D Semantic Parsing of Large-Scale Indoor Spaces
Armeni, Iro
Sener, Ozan
Zamir, Amir R.
Jiang, Helen
Brilakis, Ioannis
Fischer, Martin
Savarese, Silvio
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1534 - 1543
[2] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
Behley, Jens
Garbade, Martin
Milioto, Andres
Quenzel, Jan
Behnke, Sven
Stachniss, Cyrill
Gall, Juergen
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9296 - 9306
[3] Chen C., 2023, P IEEE CVF C COMP VI
[4] Chenfeng Xu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12373), P1, DOI 10.1007/978-3-030-58604-1_1
[5] (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network
Cheng, Ran
Razani, Ryan
Taghavi, Ehsan
Li, Enxu
Liu, Bingbing
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12542 - 12551
[6] A Unified Point-Based Framework for 3D Segmentation
Chiang, Hung-Yueh
Lin, Yen-Liang
Liu, Yueh-Cheng
Hsu, Winston H.
[J]. 2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 155 - 163
[7] PointMixer: MLP-Mixer for Point Cloud Understanding
Choe, Jaesung
Park, Chunghyun
Rameau, Francois
Park, Jaesik
Kweon, In So
[J]. COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 620 - 640
[8] Choy Christopher, 2019, P IEEE CVF C COMP VI
[9] Chung J., 2014, ARXIV
[10] Cortinhal T., 2020, ISVC

← 1 2 3 4 5 6 7 8 9 →