D-S Augmentation: Density-Semantics Augmentation for 3-D Object Detection

被引:2
作者
Liu, Zhiqiang [1 ]
Shi, Peicheng [1 ]
Qi, Heng [1 ]
Yang, Aixi [2 ]
机构
[1] Anhui Polytech Univ, Mech Engn Dept, Wuhu 241000, Peoples R China
[2] Zhejiang Univ, Polytech Inst, Hangzhou 310000, Peoples R China
关键词
Point cloud compression; Three-dimensional displays; Object detection; Feature extraction; Laser radar; Semantics; Sensor phenomena and characterization; 3-D object detection; multimodal fusion; point-cloud density augmentation; point-cloud semantic augmentation; POINT CLOUD; LIDAR;
D O I
10.1109/JSEN.2022.3231882
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Cameras and light detection and ranging (LiDAR) sensors are commonly used together in autonomous vehicles to provide rich color and texture information and information on the location of objects, respectively. However, fusing image and point-cloud information remains a key challenge. In this article, we propose D-S augmentation as a 3-D object detection method based on point-cloud density and semantic augmentation. Our proposed approach first performs 2-D bounding box detection and instance segmentation on an image. Then, a LiDAR point cloud is projected onto an instance segmentation mask, and a fixed number of random points are generated. Finally, a global N-nearest neighbor clustering is used to associate random and projected points to give depth to virtual points and complete point-cloud density augmentation (P-DA). Then, point-cloud semantic augmentation (P-SA) is performed, in which the instance segmentation mask of an object is used to associate it with the point cloud. The instance-segmented class labels and segmentation scores are assigned to the projected cloud, and the projection points added with 1-D features are inversely mapped to the point-cloud space to obtain a semantically augmented point cloud. We conducted extensive experiments on the nuScenes (Caesar et al., 2020) and KITTI (Geiger et al., 2012) datasets. The results demonstrate the effectiveness and efficiency of our proposed method. Notably, D-S augmentation outperformed a LiDAR-only baseline detector by +7.9% in terms of mean average precision (mAP) and +5.1% in terms of nuScenes detection score (NDS) and outperformed the state-of-the-art multimodal fusion-based methods. We also present the results of ablation studies to show that the fusion module improved the performance of a baseline detector.
引用
收藏
页码:2760 / 2772
页数:13
相关论文
共 53 条
[1]  
Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[2]  
Chen Q., 2020, ADV NEURAL INFORM PR, P21224
[3]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[4]  
Cho H, 2014, IEEE INT CONF ROBOT, P1836, DOI 10.1109/ICRA.2014.6907100
[5]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[6]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[7]  
Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201
[8]  
Ding ZZ, 2020, Arxiv, DOI arXiv:2006.15505
[9]   RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection [J].
Fan, Lue ;
Xiong, Xuan ;
Wang, Feng ;
Wang, Naiyan ;
Zhang, Zhaoxiang .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2898-2907
[10]  
Ge R., 2020, arXiv, DOI 10.48550/arXiv.2006.12671