V2VFormer++: Multi-Modal Vehicle-to-Vehicle Cooperative Perception via Global-Local Transformer

被引：9

作者：

Yin, Hongbo ^{[1
]}

Tian, Daxin ^{[1
]}

Lin, Chunmian ^{[1
]}

Duan, Xuting ^{[1
]}

Zhou, Jianshan ^{[1
]}

Zhao, Dezong ^{[2
]}

Cao, Dongpu ^{[3
]}

机构：

[1] Beihang Univ, Sch Transportat Sci & Engn, Beijing Key Lab Cooperat Vehicle Infrastruct Syst, State Key Lab Intelligent Transportat Syst, Beijing 100191, Peoples R China

[2] Univ Glasgow, James Watt Sch Engn, Glasgow G12 8QQ, Scotland

[3] Univ Waterloo, Dept Mech & Mechatron Engn, Waterloo, ON N2L 3G1, Canada

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Vehicle-to-vehicle (V2V) cooperative perception; multi-modal fused perception; autonomous driving; transformer; 3D object detection; intelligent transportation systems;

D O I：

10.1109/TITS.2023.3314919

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Multi-vehicle cooperative perception has recently emerged for facilitating long-range and large-scale perception ability of connected automated vehicles (CAVs). Nonetheless, enormous efforts formulate collaborative perception as LiDAR-only 3D detection paradigm, neglecting the significance and complementary of dense image. In this work, we construct the first multi-modal vehicle-to-vehicle cooperative perception framework dubbed as V2VFormer ++ , where individual camera-LiDAR representation is incorporated with dynamic channel fusion (DCF) at bird's-eye-view (BEV) space and ego-centric BEV maps from adjacent vehicles are aggregated by global-local transformer module. Specifically, channel-token mixer (CTM) with MLP design is developed to capture global response among neighboring CAVs, and position-aware fusion (PAF) further investigate the spatial correlation between each ego-networked map in a local perspective. In this manner, we could strategically determine which CAVs are desirable for collaboration and how to aggregate the foremost information from them. Quantitative and qualitative experiments are conducted on both publicly-available OPV2V and V2X-Sim 2.0 benchmarks, and our proposed V2VFormer ++ reports the state-of-the-art cooperative perception performance, demonstrating its effectiveness and advancement. Moreover, ablation study and visualization analysis further suggest the strong robustness against diverse disturbances from real-world scenarios.

引用

页码：2153 / 2166

页数：14

共 41 条

[1] V2VFormer: Vehicle-to-Vehicle Cooperative Perception With Spatial-Channel Transformer
Lin, Chunmian
Tian, Daxin
Duan, Xuting
Zhou, Jianshan
Zhao, Dezong
Cao, Dongpu
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (02): : 3384 - 3395
[2] HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer
Xiang, Hao
Xu, Runsheng
Ma, Jiaqi
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 284 - 295
[3] Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval
Xu, Yahui
Bin, Yi
Wei, Jiwei
Yang, Yang
Wang, Guoqing
Shen, Heng Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8346 - 8357
[4] Multi-modal Sensor Registration for Vehicle Perception via Deep Neural Networks
Giering, Michael
Venugopalan, Vivek
Reddy, Kishore
2015 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2015,
[5] V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception
Xu, Runsheng
Xia, Xin
Li, Jinlong
Li, Hanzhao
Zhang, Shuo
Tu, Zhengzhong
Meng, Zonglin
Xiang, Hao
Dong, Xiaoyu
Song, Rui
Yu, Hongkai
Zhou, Bolei
Ma, Jiaqi
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13712 - 13722
[6] Progressively Hybrid Transformer for Multi-Modal Vehicle Re-Identification
Pan, Wenjie
Huang, Linhan
Liang, Jianbao
Hong, Lan
Zhu, Jianqing
SENSORS, 2023, 23 (09)
[7] V2I-BEVF: Multi-modal Fusion Based on BEV Representation for Vehicle-Infrastructure Perception
Xiang, Chao
Xie, Xiaopo
Feng, Chen
Bai, Zhen
Niu, Zhendong
Yang, Mingchuan
2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 5292 - 5299
[8] Performance Analysis and Optimization of Relay-Assisted Vehicle-to-Vehicle (V2V) Cooperative Communication
Ilhan, Haci
Altunbas, Ibrahim
Uysal, Murat
2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 225 - +
[9] V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer
Xu, Runsheng
Xiang, Hao
Tu, Zhengzhong
Xia, Xin
Yang, Ming-Hsuan
Ma, Jiaqi
COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 107 - 124
[10] Decentralized Cooperative Navigation for Vehicle-to-Vehicle (V2V) Applications using GPS Integrated with UWB Range
Wang, Da
O'Keefe, Kyle
Petovello, Mark G.
PROCEEDINGS OF THE ION 2013 PACIFIC PNT MEETING, 2013, : 793 - 803

← 1 2 3 4 5 →