Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

被引:5
作者
Dao, Minh-Quan [1 ]
Berrio, Julie Stephany [2 ]
Fremont, Vincent [1 ]
Shan, Mao [2 ]
Hery, Elwan [1 ]
Worrall, Stewart [2 ]
机构
[1] Ecole Cent Nantes, F-44300 Nantes, France
[2] Univ Sydney, Australian Ctr Robot ACFR, Sydney, NSW 2008, Australia
关键词
Collaborative perception; V2X; 3D object detection; deep learning; LiDAR;
D O I
10.1109/TITS.2024.3371177
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Occlusion is a major challenge for LiDAR-based object detection methods as it renders regions of interest unobservable to the ego vehicle. A proposed solution to this problem comes from collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages a diverse perspective thanks to the presence of connected agents (vehicles and intelligent roadside units) at multiple locations to form a complete scene representation. The major challenge of V2X collaboration is the performance-bandwidth tradeoff which presents two questions 1) which information should be exchanged over the V2X network and 2) how the exchanged information is fused. The current state-of-the-art resolves to the mid-collaboration approach where Birds-Eye View (BEV) images of point clouds are communicated to enable a deep interaction among connected agents while reducing bandwidth consumption. While achieving strong performance, the real-world deployment of most mid-collaboration approaches are hindered by their overly complicated architectures and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method based on exchanging the outputs from each agent that achieves a better bandwidth-performance tradeoff while minimising the required changes to the single-vehicle detection models. Moreover, we relax the assumptions used in existing state-of-the-art approaches about inter-agent synchronization to only require a common time reference among connected agents, which can be achieved in practice using GPS time. Experiments on the V2X-Sim dataset show that our collaboration method reaches 76.72 mean average precision which is 99% the performance of the early collaboration method while consuming as much bandwidth as the late collaboration (0.01 MB on average). The code will be released in https://github.com/quan-dao/practical-collab-perception.
引用
收藏
页码:12163 / 12175
页数:13
相关论文
共 38 条
[1]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[2]  
Dao M.-Q., 2023, ARXIV
[3]   MultiXNet: Multiclass Multistage Multimodal Motion Prediction [J].
Djuric, Nemanja ;
Cui, Henggang ;
Su, Zhaoen ;
Wu, Shangxuan ;
Wang, Huahua ;
Chou, Fang-Chieh ;
San Martin, Luisa ;
Feng, Song ;
Hu, Rui ;
Xu, Yang ;
Dayan, Alyssa ;
Zhang, Sidney ;
Becker, Brian C. ;
Meyer, Gregory P. ;
Vallespi-Gonzalez, Carlos ;
Wellington, Carl K. .
2021 32ND IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2021, :435-442
[4]  
Dosovitskiy A, 2017, PR MACH LEARN RES, V78
[5]   MapFusion: A General Framework for 3D Object Detection with HDMaps [J].
Fang, Jin ;
Zhou, Dingfu ;
Song, Xibin ;
Zhang, Liangjun .
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, :3406-3413
[6]  
Geiger Andreas, 2012, IEEE C COMPUTER VISI, P3354
[7]   Weakly Supervised Learning of Rigid 3D Scene Flow [J].
Gojcic, Zan ;
Litany, Or ;
Wieser, Andreas ;
Guibas, Leonidas J. ;
Birdal, Tolga .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5688-5699
[8]  
Hu YH, 2022, AAAI CONF ARTIF INTE, P969
[9]   Dynamic 3D Scene Analysis by Point Cloud Accumulation [J].
Huang, Shengyu ;
Gojcic, Zan ;
Huang, Jiahui ;
Wieser, Andreas ;
Schindler, Konrad .
COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 :674-690
[10]  
Jaderberg M, 2015, ADV NEUR IN, V28