Learning spatio-temporal representation for cooperative 3D object detection and tracking

被引：0

作者：

Xu, Libin ^{[1
]}

Huang, Yingping ^{[1
]}

机构：

[1] Univ Shanghai Sci & Technol, Sch Opt Elect & Comp Engn, Shanghai 200093, Peoples R China

来源：

NEURAL NETWORKS | 2025年 / 190卷

基金：

中国国家自然科学基金; 上海市自然科学基金;

关键词：

Intelligent driving; Collaborative perception; 3D detection and tracking; Intermediate fusion;

D O I：

10.1016/j.neunet.2025.107626

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-agent collaborative perception, an emerging technology in intelligent driving, has attracted considerable attention in recent years. Despite advancements in previous works, challenges remain due to inevitable localization errors, data sparsity, and bandwidth limitations. To address these challenges, a collaborative detection and tracking method, CoTrack, is proposed to balance perception effectiveness with communication efficiency. Specifically, a spatio-temporal aggregation module, consisting of a spatial cross-agent collaboration submodule and a temporal ego-agent enhancement submodule, is presented. The former dynamically integrates spatial semantics from multiple agents to alleviate feature misalignment caused by localization errors, while the latter captures the historical context of the ego-agent to compensate for the insufficiency of single-frame observations resulting from data sparsity. Additionally, an unsupervised feature compressor is designed to reduce communication volume. Furthermore, a two-stage online association strategy is developed to improve the matching success rate of detection-track pairs in collaborative tracking task. Experimental results on both simulated and real datasets demonstrate that CoTrack achieves state-of-the-art performance in collaborative 3D object detection and tracking tasks while maintaining robustness in harsh and noisy environments.

引用

页数：10

共 32 条

[1] Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection [J].

Dao, Minh-Quan ;

Berrio, Julie Stephany ;

Fremont, Vincent ;

Shan, Mao ;

Hery, Elwan ;

Worrall, Stewart .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (09) :12163-12175

[2]

Dosovitskiy A, 2017, PR MACH LEARN RES, V78

[3] Vehicle-Road-Cloud Collaborative Perception Framework and Key Technologies: A Review [J].

Gao, Bolin ;

Liu, Jiaxi ;

Zou, Hengduo ;

Chen, Jiaxing ;

He, Lei ;

Li, Keqiang .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (12) :19295-19318

[4] FeaCo: Reaching Robust Feature-Level Consensus in Noisy Pose Conditions [J].

Gu, Jiaming ;

Zhang, Jingyu ;

Zhang, Muyang ;

Meng, Weiliang ;

Xu, Shibiao ;

Zhang, Jiguang ;

Zhang, Xiaopeng .

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :3628-3636

[5] Multi-agent Collaborative Perception via Motion-aware Robust Communication Network [J].

Hong, Shixin ;

Liu, Yu ;

Li, Zhi ;

Li, Shaohui ;

He, You .

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :15301-15310

[6] Squeeze-and-Excitation Networks [J].

Hu, Jie ;

Shen, Li ;

Albanie, Samuel ;

Sun, Gang ;

Wu, Enhua .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (08) :2011-2023

[7]

Hu Y, 2022, ADV NEURAL INFORM PR

[8] CCNet: Criss-Cross Attention for Semantic Segmentation [J].

Huang, Zilong ;

Wang, Xinggang ;

Huang, Lichao ;

Huang, Chang ;

Wei, Yunchao ;

Liu, Wenyu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :603-612

[9] The Hungarian Method for the assignment problem [J].

Kuhn, HW .

NAVAL RESEARCH LOGISTICS, 2005, 52 (01) :7-21

[10] PointPillars: Fast Encoders for Object Detection from Point Clouds [J].

Lang, Alex H. ;

Vora, Sourabh ;

Caesar, Holger ;

Zhou, Lubing ;

Yang, Jiong ;

Beijbom, Oscar .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12689-12697

← 1 2 3 4 →