V2VFormer++: Multi-Modal Vehicle-to-Vehicle Cooperative Perception via Global-Local Transformer

被引:9
|
作者
Yin, Hongbo [1 ]
Tian, Daxin [1 ]
Lin, Chunmian [1 ]
Duan, Xuting [1 ]
Zhou, Jianshan [1 ]
Zhao, Dezong [2 ]
Cao, Dongpu [3 ]
机构
[1] Beihang Univ, Sch Transportat Sci & Engn, Beijing Key Lab Cooperat Vehicle Infrastruct Syst, State Key Lab Intelligent Transportat Syst, Beijing 100191, Peoples R China
[2] Univ Glasgow, James Watt Sch Engn, Glasgow G12 8QQ, Scotland
[3] Univ Waterloo, Dept Mech & Mechatron Engn, Waterloo, ON N2L 3G1, Canada
基金
中国国家自然科学基金;
关键词
Vehicle-to-vehicle (V2V) cooperative perception; multi-modal fused perception; autonomous driving; transformer; 3D object detection; intelligent transportation systems;
D O I
10.1109/TITS.2023.3314919
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Multi-vehicle cooperative perception has recently emerged for facilitating long-range and large-scale perception ability of connected automated vehicles (CAVs). Nonetheless, enormous efforts formulate collaborative perception as LiDAR-only 3D detection paradigm, neglecting the significance and complementary of dense image. In this work, we construct the first multi-modal vehicle-to-vehicle cooperative perception framework dubbed as V2VFormer ++ , where individual camera-LiDAR representation is incorporated with dynamic channel fusion (DCF) at bird's-eye-view (BEV) space and ego-centric BEV maps from adjacent vehicles are aggregated by global-local transformer module. Specifically, channel-token mixer (CTM) with MLP design is developed to capture global response among neighboring CAVs, and position-aware fusion (PAF) further investigate the spatial correlation between each ego-networked map in a local perspective. In this manner, we could strategically determine which CAVs are desirable for collaboration and how to aggregate the foremost information from them. Quantitative and qualitative experiments are conducted on both publicly-available OPV2V and V2X-Sim 2.0 benchmarks, and our proposed V2VFormer ++ reports the state-of-the-art cooperative perception performance, demonstrating its effectiveness and advancement. Moreover, ablation study and visualization analysis further suggest the strong robustness against diverse disturbances from real-world scenarios.
引用
收藏
页码:2153 / 2166
页数:14
相关论文
共 41 条
  • [1] V2VFormer: Vehicle-to-Vehicle Cooperative Perception With Spatial-Channel Transformer
    Lin, Chunmian
    Tian, Daxin
    Duan, Xuting
    Zhou, Jianshan
    Zhao, Dezong
    Cao, Dongpu
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (02): : 3384 - 3395
  • [2] HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer
    Xiang, Hao
    Xu, Runsheng
    Ma, Jiaqi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 284 - 295
  • [3] Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval
    Xu, Yahui
    Bin, Yi
    Wei, Jiwei
    Yang, Yang
    Wang, Guoqing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8346 - 8357
  • [4] Multi-modal Sensor Registration for Vehicle Perception via Deep Neural Networks
    Giering, Michael
    Venugopalan, Vivek
    Reddy, Kishore
    2015 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2015,
  • [5] V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception
    Xu, Runsheng
    Xia, Xin
    Li, Jinlong
    Li, Hanzhao
    Zhang, Shuo
    Tu, Zhengzhong
    Meng, Zonglin
    Xiang, Hao
    Dong, Xiaoyu
    Song, Rui
    Yu, Hongkai
    Zhou, Bolei
    Ma, Jiaqi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13712 - 13722
  • [6] Progressively Hybrid Transformer for Multi-Modal Vehicle Re-Identification
    Pan, Wenjie
    Huang, Linhan
    Liang, Jianbao
    Hong, Lan
    Zhu, Jianqing
    SENSORS, 2023, 23 (09)
  • [7] V2I-BEVF: Multi-modal Fusion Based on BEV Representation for Vehicle-Infrastructure Perception
    Xiang, Chao
    Xie, Xiaopo
    Feng, Chen
    Bai, Zhen
    Niu, Zhendong
    Yang, Mingchuan
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 5292 - 5299
  • [8] Performance Analysis and Optimization of Relay-Assisted Vehicle-to-Vehicle (V2V) Cooperative Communication
    Ilhan, Haci
    Altunbas, Ibrahim
    Uysal, Murat
    2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 225 - +
  • [9] V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer
    Xu, Runsheng
    Xiang, Hao
    Tu, Zhengzhong
    Xia, Xin
    Yang, Ming-Hsuan
    Ma, Jiaqi
    COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 107 - 124
  • [10] Decentralized Cooperative Navigation for Vehicle-to-Vehicle (V2V) Applications using GPS Integrated with UWB Range
    Wang, Da
    O'Keefe, Kyle
    Petovello, Mark G.
    PROCEEDINGS OF THE ION 2013 PACIFIC PNT MEETING, 2013, : 793 - 803