ShaSTA: Modeling Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking

被引:11
作者
Sadjadpour, Tara [1 ]
Li, Jie [2 ]
Ambrus, Rares [3 ]
Bohg, Jeannette [1 ]
机构
[1] Stanford Univ, Sch Engn, Comp Sci Dept, Stanford, CA 94305 USA
[2] NVIDIA, Santa Clara, CA 95051 USA
[3] Toyota Res Inst, Los Altos, CA 94022 USA
关键词
Computer vision for transportation; deep learning for visual perception; visual tracking;
D O I
10.1109/LRA.2023.3323124
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Multi-object tracking (MOT) is a cornerstone capability of any robotic system. Tracking quality is largely dependent on the quality of input detections. In many applications, such as autonomous driving, it is preferable to over-detect objects to avoid catastrophic outcomes due to missed detections. As a result, current state-of-the-art 3D detectors produce high rates of false-positives to ensure a low number of false-negatives. This can negatively affect tracking by making data association and track lifecycle management more challenging. Additionally, occasional false-negative detections due to difficult scenarios like occlusions can harm tracking performance. To address these issues in a unified framework, we propose ShaSTA which learns shape and spatio-temporal affinities between tracks and detections in consecutive frames. The affinity is a probabilistic matching that leads to robust data association, track lifecycle management, false-positive elimination, false-negative propagation, and sequential track confidence refinement. We offer the first self-contained framework that addresses all aspects of the 3D MOT problem. We quantitatively evaluate ShaSTA on the nuScenes tracking benchmark with 5 metrics, including the most common tracking accuracy metric called AMOTA, to demonstrate how ShaSTA may impact the ultimate goal of an autonomous mobile agent. ShaSTA achieves 1st place amongst LiDAR-only trackers that use CenterPoint detections.
引用
收藏
页码:4273 / 4280
页数:8
相关论文
共 30 条
[11]  
Kwon Y, 2019, arXiv
[12]   PointPillars: Fast Encoders for Object Detection from Point Clouds [J].
Lang, Alex H. ;
Vora, Sourabh ;
Caesar, Holger ;
Zhou, Lubing ;
Yang, Jiong ;
Beijbom, Oscar .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12689-12697
[13]   PnPNet: End-to-End Perception and Prediction with Tracking in the Loop [J].
Liang, Ming ;
Yang, Bin ;
Zeng, Wenyuan ;
Chen, Yun ;
Hu, Rui ;
Casas, Sergio ;
Urtasun, Raquel .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11550-11559
[14]  
Liang MC, 2022, 2022 25TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2022)
[15]   HOTA: A Higher Order Metric for Evaluating Multi-object Tracking [J].
Luiten, Jonathon ;
Osep, Aljosa ;
Dendorfer, Patrick ;
Torr, Philip ;
Geiger, Andreas ;
Leal-Taixe, Laura ;
Leibe, Bastian .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (02) :548-578
[16]   Message Passing Algorithms for Scalable Multitarget Tracking [J].
Meyer, Florian ;
Kropfreiter, Thomas ;
Williams, Jason L. ;
Lau, Roslyn A. ;
Hlawatsch, Franz ;
Braca, Paolo ;
Win, Moe Z. .
PROCEEDINGS OF THE IEEE, 2018, 106 (02) :221-259
[17]   Deep Hough Voting for 3D Object Detection in Point Clouds [J].
Qi, Charles R. ;
Litany, Or ;
He, Kaiming ;
Guibas, Leonidas J. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9276-9285
[18]   PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation [J].
Qi, Charles R. ;
Su, Hao ;
Mo, Kaichun ;
Guibas, Leonidas J. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :77-85
[19]   Disentangling Monocular 3D Object Detection [J].
Simonelli, Andrea ;
Bulo, Samuel Rota ;
Porzi, Lorenzo ;
Lopez-Antequera, Manuel ;
Kontschieder, Peter .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1991-1999
[20]   SpOT: Spatiotemporal Modeling for 3D Object Tracking [J].
Stearns, Colton ;
Rempe, Davis ;
Li, Jie ;
Ambrus, Rare ;
Zakharov, Sergey ;
Guizilini, Vitor ;
Yang, Yanchao ;
Guibas, Leonidas J. .
COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 :639-656