Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

被引：5

作者：

Dao, Minh-Quan ^{[1
]}

Berrio, Julie Stephany ^{[2
]}

Fremont, Vincent ^{[1
]}

Shan, Mao ^{[2
]}

Hery, Elwan ^{[1
]}

Worrall, Stewart ^{[2
]}

机构：

[1] Ecole Cent Nantes, F-44300 Nantes, France

[2] Univ Sydney, Australian Ctr Robot ACFR, Sydney, NSW 2008, Australia

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 09期

关键词：

Collaborative perception; V2X; 3D object detection; deep learning; LiDAR;

D O I：

10.1109/TITS.2024.3371177

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Occlusion is a major challenge for LiDAR-based object detection methods as it renders regions of interest unobservable to the ego vehicle. A proposed solution to this problem comes from collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages a diverse perspective thanks to the presence of connected agents (vehicles and intelligent roadside units) at multiple locations to form a complete scene representation. The major challenge of V2X collaboration is the performance-bandwidth tradeoff which presents two questions 1) which information should be exchanged over the V2X network and 2) how the exchanged information is fused. The current state-of-the-art resolves to the mid-collaboration approach where Birds-Eye View (BEV) images of point clouds are communicated to enable a deep interaction among connected agents while reducing bandwidth consumption. While achieving strong performance, the real-world deployment of most mid-collaboration approaches are hindered by their overly complicated architectures and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method based on exchanging the outputs from each agent that achieves a better bandwidth-performance tradeoff while minimising the required changes to the single-vehicle detection models. Moreover, we relax the assumptions used in existing state-of-the-art approaches about inter-agent synchronization to only require a common time reference among connected agents, which can be achieved in practice using GPS time. Experiments on the V2X-Sim dataset show that our collaboration method reaches 76.72 mean average precision which is 99% the performance of the early collaboration method while consuming as much bandwidth as the late collaboration (0.01 MB on average). The code will be released in https://github.com/quan-dao/practical-collab-perception.

引用

页码：12163 / 12175

页数：13

共 38 条

[1] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[2]

Dao M.-Q., 2023, ARXIV

[3] MultiXNet: Multiclass Multistage Multimodal Motion Prediction [J].

Djuric, Nemanja ;

Cui, Henggang ;

Su, Zhaoen ;

Wu, Shangxuan ;

Wang, Huahua ;

Chou, Fang-Chieh ;

San Martin, Luisa ;

Feng, Song ;

Hu, Rui ;

Xu, Yang ;

Dayan, Alyssa ;

Zhang, Sidney ;

Becker, Brian C. ;

Meyer, Gregory P. ;

Vallespi-Gonzalez, Carlos ;

Wellington, Carl K. .

2021 32ND IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2021, :435-442

[4]

Dosovitskiy A, 2017, PR MACH LEARN RES, V78

[5] MapFusion: A General Framework for 3D Object Detection with HDMaps [J].

Fang, Jin ;

Zhou, Dingfu ;

Song, Xibin ;

Zhang, Liangjun .

2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, :3406-3413

[6]

Geiger Andreas, 2012, IEEE C COMPUTER VISI, P3354

[7] Weakly Supervised Learning of Rigid 3D Scene Flow [J].

Gojcic, Zan ;

Litany, Or ;

Wieser, Andreas ;

Guibas, Leonidas J. ;

Birdal, Tolga .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5688-5699

[8]

Hu YH, 2022, AAAI CONF ARTIF INTE, P969

[9] Dynamic 3D Scene Analysis by Point Cloud Accumulation [J].

Huang, Shengyu ;

Gojcic, Zan ;

Huang, Jiahui ;

Wieser, Andreas ;

Schindler, Konrad .

COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 :674-690

[10]

Jaderberg M, 2015, ADV NEUR IN, V28

← 1 2 3 4 →