MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving

被引：54

作者：

Li, Jiale ^{[1
]}

Dai, Hang ^{[2
]}

Han, Hao ^{[3
]}

Ding, Yong ^{[3
]}

机构：

[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou, Peoples R China

[2] Univ Glasgow, Sch Comp Sci, Glasgow, Lanark, Scotland

[3] Zhejiang Univ, Sch Micronano Elect, Hangzhou, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

REPRESENTATION;

D O I：

10.1109/CVPR52729.2023.02078

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

LiDAR and camera are two modalities available for 3D semantic segmentation in autonomous driving. The popular LiDAR-only methods severely suffer from inferior segmentation on small and distant objects due to insufficient laser points, while the robust multi-modal solution is under-explored, where we investigate three crucial inherent difficulties: modality heterogeneity, limited sensor field of view intersection, and multi-modal data augmentation. We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion to mitigate the modality heterogeneity. The multi-modal fusion in MSeg3D consists of geometry-based feature fusion GF-Phase, cross-modal feature completion, and semantic-based feature fusion SF-Phase on all visible points. The multi-modal data augmentation is reinvigorated by applying asymmetric transformations on LiDAR point cloud and multi-camera images individually, which benefits the model training with diversified augmentation transformations. MSeg3D achieves state-of-the-art results on nuScenes, Waymo, and SemanticKITTI datasets. Under the malfunctioning multi-camera input and the multi-frame point clouds input, MSeg3D still shows robustness and improves the LiDARonly baseline. Our code is publicly available at https://github.com/jialeli1/lidarseg3d.

引用

页码：21694 / 21704

页数：11

共 67 条

[1]

[Anonymous], 2020, ECCV

[2]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00981

[3]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01161

[4]

[Anonymous], 2020, AAAI

[5]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.00738

[6]

[Anonymous], 2021, COMPUTER VISION PATT, DOI DOI 10.1109/CVPR46437.2021.01162

[7]

[Anonymous], 2019, IEEE INT CONF ROBOT, DOI DOI 10.1109/ICRA.2019.8793495

[8]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01508

[9]

[Anonymous], 2019, 2019 INT C ROBOTICS, DOI DOI 10.1109/ICRA.2019.8794195

[10] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].

Behley, Jens ;

Garbade, Martin ;

Milioto, Andres ;

Quenzel, Jan ;

Behnke, Sven ;

Stachniss, Cyrill ;

Gall, Juergen .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306

← 1 2 3 4 5 6 7 →