A Multi-Phase Camera-LiDAR Fusion Network for 3D Semantic Segmentation With Weak Supervision

被引:14
作者
Chang, Xuepeng [1 ]
Pan, Huihui [1 ,2 ]
Sun, Weichao [1 ]
Gao, Huijun [1 ,3 ]
机构
[1] Harbin Inst Technol, Res Inst Intelligent Control & Syst, Harbin 150001, Peoples R China
[2] Ningbo Inst Intelligent Equipment Technol Co Ltd, Ningbo 315200, Peoples R China
[3] Yongjiang Lab, Ningbo 315202, Peoples R China
基金
中国国家自然科学基金;
关键词
Autonomous driving; multi-modal fusion; 3D semantic segmentation; weak supervision;
D O I
10.1109/TCSVT.2023.3241641
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Camera and LiDAR are indispensable perception units in autonomous driving, providing complementary environmental information for 3D semantic segmentation. It is the key point that fuses the information of two modalities to accurate and robust semantic segmentation. However, three major factors will restrict the performance of fusion-based methods, i.e., the reliability of image features, the contribution of different image features, and the trade-off between results of image and point cloud. This paper proposes a novel multi-phase fusion network for 3D semantic segmentation. For the first factor, this paper takes the lead in regarding the problem that image features may be wrong due to the lack of dense annotations in the common datasets as a weak supervision problem and introduces the weakly supervised loss. Second, the proposed attention based feature fusion module can filter and reweight the image features effectively. Third, the results of the two modalities are further fused by self-confidence based late fusion module at pixel-level to complement their advantages. The proposed scheme has been evaluated on nuScenes and SemanticKITTI benchmarks, and the results show the competitiveness with state-of-the-art methods. The ablation studies demonstrate the superiority of the method in sparse classes segmentation. In addition, the robustness is also evaluated, and the results of the proposed method can keep relatively accurate even when faults in one of the sensors.
引用
收藏
页码:3737 / 3746
页数:10
相关论文
共 41 条
[1]   SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].
Behley, Jens ;
Garbade, Martin ;
Milioto, Andres ;
Quenzel, Jan ;
Behnke, Sven ;
Stachniss, Cyrill ;
Gall, Juergen .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306
[2]  
Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[3]   Grad-CAM plus plus : Generalized Gradient-based Visual Explanations for Deep Convolutional Networks [J].
Chattopadhay, Aditya ;
Sarkar, Anirban ;
Howlader, Prantik ;
Balasubramanian, Vineeth N. .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :839-847
[4]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[5]   (AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network [J].
Cheng, Ran ;
Razani, Ryan ;
Taghavi, Ehsan ;
Li, Enxu ;
Liu, Bingbing .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12542-12551
[6]  
Cortinhal T, 2020, Arxiv, DOI arXiv:2003.03653
[7]   Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review [J].
Cui, Yaodong ;
Chen, Ren ;
Chu, Wenbo ;
Chen, Long ;
Tian, Daxin ;
Li, Ying ;
Cao, Dongpu .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (02) :722-739
[8]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[9]   ESGN: Efficient Stereo Geometry Network for Fast 3D Object Detection [J].
Gao, Aqi ;
Pang, Yanwei ;
Nie, Jing ;
Shao, Zhuang ;
Cao, Jiale ;
Guo, Yishun ;
Li, Xuelong .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) :2000-2009
[10]  
Huang KL, 2022, Arxiv, DOI [arXiv:2202.02703, 10.48550/arXiv.2202.02703, DOI 10.48550/ARXIV.2202.02703]