Rethinking Range View Representation for LiDAR Segmentation

被引：68

作者：

Kong, Lingdong ^{[1
,2
]}

Liu, Youquan ^{[1
,3
]}

Chen, Runnan ^{[1
,4
]}

Ma, Yuexin ^{[5
]}

Zhu, Xinge ^{[6
]}

Li, Yikang ^{[1
]}

Hou, Yuenan ^{[1
]}

Qiao, Yu ^{[1
]}

Liu, Ziwei ^{[7
]}

机构：

[1] Shanghai AI Lab, Shanghai, Peoples R China

[2] Natl Univ Singapore, Singapore, Singapore

[3] Hsch Bremerhaven, Bremerhaven, Germany

[4] Univ Hong Kong, Hong Kong, Peoples R China

[5] Shanghai Tech Univ, Shanghai, Peoples R China

[6] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[7] Nanyang Technol Univ, S Lab, Singapore, Singapore

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

关键词：

D O I：

10.1109/ICCV51070.2023.00028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

LiDAR segmentation is crucial for autonomous driving perception. Recent trends favor point- or voxel-based methods as they often yield better performance than the traditional range view representation. In this work, we unveil several key factors in building powerful range view models. We observe that the "many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections. We present RangeFormer - a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing - that better handles the learning and processing of LiDAR point clouds from the range view. We further introduce a Scalable Training from Range view (STR) strategy that trains on arbitrary low-resolution 2D range images, while still maintaining satisfactory 3D segmentation accuracy. We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks, i.e., SemanticKITTI, nuScenes, and ScribbleKITTI.

引用

页码：228 / 240

页数：13

共 81 条

[1]

Aksoy EE, 2020, IEEE INT VEH SYM, P926, DOI [10.1109/IV47402.2020.9304694, 10.13140/rg.2.2.22837.83689]

[2] Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds [J].

Alnaggar, Yara Ali ;

Afifi, Mohamed ;

Amer, Karim ;

ElHelw, Mohamed .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, :1799-1808

[3] 3D-MiniNet: Learning a 2D Representation From Point Clouds for Fast and Efficient 3D LIDAR Semantic Segmentation [J].

Alonso, Inigo ;

Riazuelo, Luis ;

Montesano, Luis ;

Murillo, Ana C. .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) :5432-5439

[4] RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving [J].

Ando, Angelika ;

Gidaris, Spyros ;

Bursuc, Andrei ;

Puy, Gilles ;

Boulch, Alexandre ;

Marlet, Renaud .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :5240-5250

[5] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].

Behley, Jens ;

Garbade, Martin ;

Milioto, Andres ;

Quenzel, Jan ;

Behnke, Sven ;

Stachniss, Cyrill ;

Gall, Juergen .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306

[6] The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks [J].

Berman, Maxim ;

Triki, Amal Rannen ;

Blaschko, Matthew B. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4413-4421

[7] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[9]

Chen Q, 2021, ADV NEUR IN

[10] CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP [J].

Chen, Runnan ;

Liu, Youquan ;

Kong, Lingdong ;

Zhu, Xinge ;

Ma, Yuexin ;

Li, Yikang ;

Hou, Yuenan ;

Qiao, Yu ;

Wang, Wenping .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :7020-7030

← 1 2 3 4 5 6 7 8 9 →