ACF-Net: Asymmetric Cascade Fusion for 3D Detection With LiDAR Point Clouds and Images

被引:14
作者
Tian, Yonglin [1 ,2 ]
Zhang, Xianjing [2 ]
Wang, Xiao [3 ]
Xu, Jintao [2 ]
Wang, Jiangong [1 ]
Ai, Rui [4 ]
Gu, Weihao [4 ]
Ding, Weiping [5 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
[2] Haomo Technol Co Ltd, AI Ctr, Beijing 100192, Peoples R China
[3] Anhui Univ, Sch Artificial Intelligence, Hefei 230031, Peoples R China
[4] Haomo Technol Co Ltd, Beijing 100192, Peoples R China
[5] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
来源
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES | 2024年 / 9卷 / 02期
关键词
Three-dimensional displays; Feature extraction; Point cloud compression; Laser radar; Object detection; Timing; Fuses; 3D detection; autonomous driving; asymmetric fusion; cascade fusion; multimodal fusion; OBJECT; PERFORMANCE;
D O I
10.1109/TIV.2023.3341223
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recognition and utilization of complementary information arising from modality-intrinsic properties play crucial roles in multimodal 3D detection. However, most of the current approaches for fusion-based 3D detection follow symmetrical fusion paradigms and adopt early fusion, middle fusion as well as late fusion styles, which ignore the unequal status of data with different modalities. In this paper, according to the timing of fusion, we adopt an asymmetric cascade fusion network to exploit both the structural information from point clouds and the complementary semantic information from images. A multi-stage cascade design of 3D object detection is proposed to iteratively refine predictions and several late image features (comprised of detection clues, segmentation clues, and deep features from encoders) are incorporated into different stages of the LiDAR branch to maintain the integrity of image features and enable deep multimodal interactions. Besides, to mitigate the effects of the down-sampling of voxelized features and possible mismatching of multimodal data, we propose proxy-based cross-modality sampling to utilize the high-density point clouds coordinates and develop an image degeneration process to simulate the noise in cross-modality matching for robust training. Extensive experiments are conducted on KITTI and Waymo Open Dataset, which validate the effectiveness of the proposed method.
引用
收藏
页码:3360 / 3371
页数:12
相关论文
共 50 条
[31]   SCDA-Net: Structure Completion and Density Awareness Network for LiDAR-Based 3D Object Detection [J].
Wu, Shuwen ;
Yang, Jinfu ;
Ma, Jiaqi ;
Zhang, Shaochen ;
Hao, Tianhao ;
Li, Mingai .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (05) :4268-4275
[32]   WBF-ODAL: Weighted Boxes Fusion for 3D Object Detection from Automotive LiDAR Point Clouds [J].
Katkoria, Dhvani ;
Sreevalsan-Nair, Jaya ;
Sati, Mayank ;
Karunakaran, Sunil .
2024 INTERNATIONAL CONFERENCE ON VEHICULAR TECHNOLOGY AND TRANSPORTATION SYSTEMS, ICVTTS, 2024,
[33]   Fully Sparse Fusion for 3D Object Detection [J].
Li, Yingyan ;
Fan, Lue ;
Liu, Yang ;
Huang, Zehao ;
Chen, Yuntao ;
Wang, Naiyan ;
Zhang, Zhaoxiang .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (11) :7217-7231
[34]   SAM-Net: LiDAR Depth Inpainting for 3D Static Map Generation [J].
Lee, Junhyeop ;
Hwang, Sangwon ;
Kim, Woo Jin ;
Lee, Sangyoun .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (08) :12213-12228
[35]   FuseNet: 3D Object Detection Network with Fused Information for Lidar Point Clouds [J].
Biao Liu ;
Bihao Tian ;
Hengyang Wang ;
Junchao Qiao ;
Zhi Wang .
Neural Processing Letters, 2022, 54 :5063-5078
[36]   FuseNet: 3D Object Detection Network with Fused Information for Lidar Point Clouds [J].
Liu, Biao ;
Tian, Bihao ;
Wang, Hengyang ;
Qiao, Junchao ;
Wang, Zhi .
NEURAL PROCESSING LETTERS, 2022, 54 (06) :5063-5078
[37]   CLF3D: A Coarse-Labeling Framework to Facilitate 3D Object Detection in Point Clouds [J].
Cheng, Nuo ;
Luo, Chuanyu ;
Li, Han ;
Ma, Sikun ;
Lei, Shengguang ;
Li, Pu .
IEEE ACCESS, 2025, 13 :105753-105765
[38]   MCHFormer: A Multi-Cross Hybrid Former of Point-Image for 3D Object Detection [J].
Cao, Feng ;
Xue, Jun ;
Tao, Chongben ;
Luo, Xizhao ;
Gao, Zhen ;
Zhang, Zufeng ;
Zheng, Sifa ;
Zhu, Yuan .
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01) :383-394
[39]   Hierarchical Queries for 3D Lane Detection Based on Multi-Frame Point Clouds [J].
Liu, Ruixin ;
Yuan, Zejian .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025,
[40]   CFPC: The Curbed Fake Point Collector to Pseudo-LiDAR-Based 3D Object Detection for Autonomous Vehicles [J].
Gao, Honghao ;
Shao, Jie ;
Iqbal, Muddesar ;
Wang, Ye ;
Xiang, Zhengzhe .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2025, 74 (02) :1922-1934