Learning spatio-temporal representation for cooperative 3D object detection and tracking

被引:0
作者
Xu, Libin [1 ]
Huang, Yingping [1 ]
机构
[1] Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 200093, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Intelligent driving; Collaborative perception; 3D detection and tracking; Intermediate fusion;
D O I
10.1016/j.neunet.2025.107626
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-agent collaborative perception, an emerging technology in intelligent driving, has attracted considerable attention in recent years. Despite advancements in previous works, challenges remain due to inevitable localization errors, data sparsity, and bandwidth limitations. To address these challenges, a collaborative detection and tracking method, CoTrack, is proposed to balance perception effectiveness with communication efficiency. Specifically, a spatio-temporal aggregation module, consisting of a spatial cross-agent collaboration submodule and a temporal ego-agent enhancement submodule, is presented. The former dynamically integrates spatial semantics from multiple agents to alleviate feature misalignment caused by localization errors, while the latter captures the historical context of the ego-agent to compensate for the insufficiency of single-frame observations resulting from data sparsity. Additionally, an unsupervised feature compressor is designed to reduce communication volume. Furthermore, a two-stage online association strategy is developed to improve the matching success rate of detection-track pairs in collaborative tracking task. Experimental results on both simulated and real datasets demonstrate that CoTrack achieves state-of-the-art performance in collaborative 3D object detection and tracking tasks while maintaining robustness in harsh and noisy environments.
引用
收藏
页数:10
相关论文
共 32 条
[31]   VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection [J].
Zhou, Yin ;
Tuzel, Oncel .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4490-4499
[32]  
Zhu X., 2020, P INT C LEARN REPR