Multi-modal Perception Fusion Method Based on Cross Attention

被引：0

作者：

Zhang B.-L. ^{[1
,2
]}

Pan Z.-H. ^{[1
,2
]}

Jiang J.-Z. ^{[1
,2
]}

Zhang C.-B. ^{[1
,2
]}

Wang Y.-X. ^{[1
,2
]}

Yang C.-L. ^{[1
,2
]}

机构：

[1] School of Automobile and Traffic Engineering, Hefei University of Technology, Anhui, Hefei

[2] Anhui Engineering Laboratory of Intelligent Automobile, Anhui, Hefei

来源：

Zhongguo Gonglu Xuebao/China Journal of Highway and Transport | 2024年 / 37卷 / 03期

关键词：

3D target detection; automotive engineering; cross-attention; information correction; late-fusion; multimodal fusion;

D O I：

10.19721/j.cnki.1001-7372.2024.03.009

中图分类号：

学科分类号：

摘要：

To address the problems related to the limited perception ability of single sensors and complex late-fusion processing of multi sensors in intelligent vehicle road target detection tasks, this study proposes a multi-modal perception fusion method based on Transformer Cross Attention. First, by utilizing the advantage of cross-attention, which can effectively fuse multimodal information, an end-to-end fusion perception network was constructed to receive the output of visual and point cloud detection networks and perform post-fusion processing. Second, the 3D target detection of the point cloud detection network was subjected to high-recall processing, which was used as an input to the network, along with the target detection output by the visual detector. Finally, the fusion of 2D target information with 3D information was achieved through the network, and the correction of the 3D target detection information was output, yielding more accurate post-fusion detection information. The validation metrics on the KITTI public dataset showed that after introducing 2D detection information through the fusion method proposed in this study, compared with the four benchmark methods, PointPillars, PointRCNN, PV-RCNN, and CenterPoint, the comprehensive average improvements for the three categories of vehicles, cyclists, and pedestrians were 7. 07%, 2. 82%, 2. 46%, and 1. 60%, respectively. Compared with rule-based post-fusion methods, the fusion network proposed in this study obtained an average improvement of 1. 88% and 4. 90% in detecting medium- and highly-difficult samples for pedestrians and cyclists, respectively, indicating that the proposed method has a stronger adaptability and generalization ability. Finally, a real vehicle test platform was constructed, and algorithm validation was performed. A visual qualitative analysis was conducted on selected real vehicle test scenarios, and the detection method and network model proposed in this study were validated under actual road scenarios. © 2024 Chang'an University. All rights reserved.

引用

页码：181 / 193

页数：12

共 30 条

[1]

BOCHKOVSKIY A, WANG C Y, LIAO H Y M., YOLOv4:Optimal speed and accuracy of object detection, Computer Vision and Pattern Recognition

[2]

LIN T Y, GOYAL P, GIRSHICK R, Et al., Focal loss for dense object detection[C], 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980-2988, (2017)

[3]

LANG A H, VORA S, CAESAR H, Et al., PointPillars:Fast encoders for object detection from point clouds, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12697-12705, (2019)

[4]

SHI S S, WANG X G, LI H S., PointRCNN:3D object proposal generation and detection from point cloud, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-779, (2019)

[5]

ZHU M, MA C, JI P, Et al., Cross-modality 3D object detection[C], 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3772-3781, (2021)

[6]

VORA S, LANG A H, HELOU B, Et al., PointPainting:Sequential fusion for 3D object detection, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4604-4612, (2020)

[7]

HUANG T T, LIU Z, CHEN X W, Et al., EPNet:Enhancing point features with image semantics for 3D object detection[C], Computer Vision-ECCV 2020, pp. 35-52, (2020)

[8]

SINDAGI V A, ZHOU Y, TUZEL O., MVX-net:Multimodal voxelNet for 3D object detection, 2019 International Conference on Robotics and Automation (ICRA), pp. 7276-7282, (2019)

[9]

ZHU H Q, DENG J J, ZHANG Y, Et al., VPFNet:Improving 3D object detection with virtual point based LiDAR and stereo data fusion[J], IEEE Transactions on Multimedia, 25, pp. 5291-5304, (2023)

[10]

ZHANG Bing-li, ZHAN Ye-hui, PAN Da-wei, Et al., Vehicle detection based on fusion of millimeter wave radar and machine vision[J], Automotive Engineering, 43, 4, pp. 478-484, (2021)

← 1 2 3 →