Multiview Fusion Driven 3-D Point Cloud Semantic Segmentation Based on Hierarchical Transformer

被引：9

作者：

Xu, Wang ^{[1
]}

Li, Xu ^{[1
]}

Ni, Peizhou ^{[1
]}

Guang, Xingxing ^{[2
,3
]}

Luo, Hang ^{[2
,3
]}

Zhao, Xijun ^{[2
,3
]}

机构：

[1] Southeast Univ, Sch Instrument Sci & Engn, Nanjing 210096, Peoples R China

[2] China North Artificial Intelligence & Innovat Res, Beijing 100072, Peoples R China

[3] Collective Intelligence & Collaborat Lab CIC, Beijing 100072, Peoples R China

来源：

IEEE SENSORS JOURNAL | 2023年 / 23卷 / 24期

关键词：

3-D point cloud; multihead attention; multiview fusion; semantic segmentation;

D O I：

10.1109/JSEN.2023.3328603

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Three-dimensional semantic segmentation is a key task of environment understanding in various outdoor scenes. Due to the sparsity and varying density of point clouds, it becomes challenging to obtain fine-gained segmentation results. Previous point-based and voxel-based methods suffer from the expensive computational cost. Recent 2-D projection-based methods, including range-view (RV), bird-eye-view (BEV), and multiview fusion methods, can run in real time, but the information loss during the projection leads to the low accuracy. Also, we find that the occlusion and interlacing problems exist in single projection-based methods and most multiview fusion networks only focus on the output-level fusion. Considering the above issues, we propose a multilevel multiview fusion network using attention modules and hierarchical transformer, which ensures the effectiveness and efficiency mainly by the following three aspects: 1) the spatial-channel attention module (SCAM) integrates contextual information between points and learn differences of each channel's features; 2) the proposed geometry-based multiprojection fusion module (GMFM) achieves the geometric feature alignment between RV and BEV and fuses the features of the two views at both feature level and output level; and 3) we introduce KPConv to replace KNN, which can reduce the information loss during the postprocessing. Experiments are conducted on both structured and unstructured datasets, including urban dataset SemanticKITTI and off-road dataset Rellis3D. Our results achieve a better performance compared to other projection-based methods and are comparable with the state-of-the-art Cylinder3D.

引用

页码：31461 / 31470

页数：10

共 41 条

[1]

Aksoy EE, 2020, IEEE INT VEH SYM, P926, DOI [10.1109/IV47402.2020.9304694, 10.13140/rg.2.2.22837.83689]

[2] Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds [J].

Alnaggar, Yara Ali ;

Afifi, Mohamed ;

Amer, Karim ;

ElHelw, Mohamed .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, :1799-1808

[3] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].

Behley, Jens ;

Garbade, Martin ;

Milioto, Andres ;

Quenzel, Jan ;

Behnke, Sven ;

Stachniss, Cyrill ;

Gall, Juergen .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306

[4] The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks [J].

Berman, Maxim ;

Triki, Amal Rannen ;

Blaschko, Matthew B. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4413-4421

[5]

Cheng H., 2022, 2022 IEEE INT C MULT, P1, DOI [10.1109/ICME52920.2022.9859693, DOI 10.1109/ICME52920.2022.9859693]

[6] (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network [J].

Cheng, Ran ;

Razani, Ryan ;

Taghavi, Ehsan ;

Li, Enxu ;

Liu, Bingbing .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12542-12551

[7] 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks [J].

Choy, Christopher ;

Gwak, JunYoung ;

Savarese, Silvio .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3070-3079

[8]

Cortinhal Tiago, 2020, Advances in Visual Computing. 15th International Symposium, ISVC 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12510), P207, DOI 10.1007/978-3-030-64559-5_16

[9] Driving Behavior Analysis of Intelligent Vehicle System for Lane Detection Using Vision-Sensor [J].

Dewangan, Deepak Kumar ;

Sahu, Satya Prakash .

IEEE SENSORS JOURNAL, 2021, 21 (05) :6367-6375

[10]

Liong VE, 2020, Arxiv, DOI arXiv:2012.04934

← 1 2 3 4 5 →