Learning spatio-temporal representation for cooperative 3D object detection and tracking

被引:0
作者
Xu, Libin [1 ]
Huang, Yingping [1 ]
机构
[1] Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 200093, Peoples R China
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
Intelligent driving; Collaborative perception; 3D detection and tracking; Intermediate fusion;
D O I
10.1016/j.neunet.2025.107626
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-agent collaborative perception, an emerging technology in intelligent driving, has attracted considerable attention in recent years. Despite advancements in previous works, challenges remain due to inevitable localization errors, data sparsity, and bandwidth limitations. To address these challenges, a collaborative detection and tracking method, CoTrack, is proposed to balance perception effectiveness with communication efficiency. Specifically, a spatio-temporal aggregation module, consisting of a spatial cross-agent collaboration submodule and a temporal ego-agent enhancement submodule, is presented. The former dynamically integrates spatial semantics from multiple agents to alleviate feature misalignment caused by localization errors, while the latter captures the historical context of the ego-agent to compensate for the insufficiency of single-frame observations resulting from data sparsity. Additionally, an unsupervised feature compressor is designed to reduce communication volume. Furthermore, a two-stage online association strategy is developed to improve the matching success rate of detection-track pairs in collaborative tracking task. Experimental results on both simulated and real datasets demonstrate that CoTrack achieves state-of-the-art performance in collaborative 3D object detection and tracking tasks while maintaining robustness in harsh and noisy environments.
引用
收藏
页数:10
相关论文
共 32 条
[21]  
Xu RS, 2022, PR MACH LEARN RES, V205, P989
[22]   V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception [J].
Xu, Runsheng ;
Xia, Xin ;
Li, Jinlong ;
Li, Hanzhao ;
Zhang, Shuo ;
Tu, Zhengzhong ;
Meng, Zonglin ;
Xiang, Hao ;
Dong, Xiaoyu ;
Song, Rui ;
Yu, Hongkai ;
Zhou, Bolei ;
Ma, Jiaqi .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :13712-13722
[23]   OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication [J].
Xu, Runsheng ;
Xiang, Hao ;
Xia, Xin ;
Han, Xu ;
Li, Jinlong ;
Ma, Jiaqi .
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, :2583-2589
[24]   V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer [J].
Xu, Runsheng ;
Xiang, Hao ;
Tu, Zhengzhong ;
Xia, Xin ;
Yang, Ming-Hsuan ;
Ma, Jiaqi .
COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 :107-124
[25]   OpenCDA: An Open Cooperative Driving Automation Framework Integrated with Co-Simulation [J].
Xu, Runsheng ;
Guo, Yi ;
Han, Xu ;
Xia, Xin ;
Xiang, Hao ;
Ma, Jiaqi .
2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, :1155-1162
[26]   SECOND: Sparsely Embedded Convolutional Detection [J].
Yan, Yan ;
Mao, Yuxing ;
Li, Bo .
SENSORS, 2018, 18 (10)
[27]  
Yang DK, 2023, ADV NEUR IN
[28]   Align Before Collaborate: Mitigating Feature Misalignment for Robust Multi-agent Perception [J].
Yang, Kun ;
Yang, Dingkang ;
Li, Ke ;
Xiao, Dongling ;
Shao, Zedian ;
Sung, Peng ;
Song, Liang .
COMPUTER VISION-ECCV 2024, PT IV, 2025, 15062 :282-299
[29]   Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception [J].
Yang, Kun ;
Yang, Dingkang ;
Zhang, Jingyu ;
Li, Mingcheng ;
Liu, Yang ;
Liu, Jing ;
Wang, Hanqi ;
Sun, Peng ;
Song, Liang .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :23326-23335
[30]   ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments [J].
Zhang, Jingyu ;
Yang, Kun ;
Wang, Yilei ;
Wang, Hanqi ;
Sun, Peng ;
Song, Liang .
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :12575-12584