ShaSTA: Modeling Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking

被引：11

作者：

Sadjadpour, Tara ^{[1
]}

Li, Jie ^{[2
]}

Ambrus, Rares ^{[3
]}

Bohg, Jeannette ^{[1
]}

机构：

[1] Stanford Univ, Sch Engn, Comp Sci Dept, Stanford, CA 94305 USA

[2] NVIDIA, Santa Clara, CA 95051 USA

[3] Toyota Res Inst, Los Altos, CA 94022 USA

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 05期

关键词：

Computer vision for transportation; deep learning for visual perception; visual tracking;

D O I：

10.1109/LRA.2023.3323124

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Multi-object tracking (MOT) is a cornerstone capability of any robotic system. Tracking quality is largely dependent on the quality of input detections. In many applications, such as autonomous driving, it is preferable to over-detect objects to avoid catastrophic outcomes due to missed detections. As a result, current state-of-the-art 3D detectors produce high rates of false-positives to ensure a low number of false-negatives. This can negatively affect tracking by making data association and track lifecycle management more challenging. Additionally, occasional false-negative detections due to difficult scenarios like occlusions can harm tracking performance. To address these issues in a unified framework, we propose ShaSTA which learns shape and spatio-temporal affinities between tracks and detections in consecutive frames. The affinity is a probabilistic matching that leads to robust data association, track lifecycle management, false-positive elimination, false-negative propagation, and sequential track confidence refinement. We offer the first self-contained framework that addresses all aspects of the 3D MOT problem. We quantitatively evaluate ShaSTA on the nuScenes tracking benchmark with 5 metrics, including the most common tracking accuracy metric called AMOTA, to demonstrate how ShaSTA may impact the ultimate goal of an autonomous mobile agent. ShaSTA achieves 1st place amongst LiDAR-only trackers that use CenterPoint detections.

引用

页码：4273 / 4280

页数：8

共 30 条

[1] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[2]

Chen SY, 2022, Arxiv, DOI arXiv:2206.10965

[3]

Chiu HK, 2020, Arxiv, DOI arXiv:2001.05673

[4] Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous Driving [J].

Chiu, Hsu-kuang ;

Lie, Jie ;

Ambrus, Rares ;

Bohg, Jeannette .

2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, :14227-14233

[5]

Engelcke Martin, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1355, DOI 10.1109/ICRA.2017.7989161

[6] The Pascal Visual Object Classes (VOC) Challenge [J].

Everingham, Mark ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338

[7] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[8] Joint Class-Affinity Loss Correction for Robust Medical Image Segmentation with Noisy Labels [J].

Guo, Xiaoqing ;

Yuan, Yixuan .

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV, 2022, 13434 :588-598

[9] Gaussian Affinity for Max-margin Class Imbalanced Learning [J].

Hayat, Munawar ;

Khan, Salman ;

Zamir, Syed Waqas ;

Shen, Jianbing ;

Shao, Ling .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6478-6488

[10] EagerMOT: 3D Multi-Object Tracking via Sensor Fusion [J].

Kim, Aleksandr ;

Osep, Aljosa ;

Leal-Taixe, Laura .

2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, :11315-11321

← 1 2 3 →