TLtrack: Combining Transformers and a Linear Model for Robust Multi-Object Tracking

被引：1

作者：

He, Zuojie ^{[1
]}

Zhao, Kai ^{[2
]}

Zeng, Dan ^{[1
]}

机构：

[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai 200444, Peoples R China

[2] Univ Calif Los Angeles, Dept Radiol, Los Angeles, CA 90095 USA

来源：

AI | 2024年 / 5卷 / 03期

基金：

中国国家自然科学基金;

关键词：

multi-object tracking; motion prediction; transformer; OBJECT TRACKING;

D O I：

10.3390/ai5030047

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-object tracking (MOT) aims at estimating locations and identities of objects in videos. Many modern multiple-object tracking systems follow the tracking-by-detection paradigm, consisting of a detector followed by a method for associating detections into tracks. Tracking by associating detections through motion-based similarity heuristics is the basic way. Motion models aim at utilizing motion information to estimate future locations, playing an important role in enhancing the performance of association. Recently, a large-scale dataset, DanceTrack, where objects have uniform appearance and diverse motion patterns, was proposed. With existing hand-crafted motion models, it is hard to achieve decent results on DanceTrack because of the lack of prior knowledge. In this work, we present a motion-based algorithm named TLtrack, which adopts a hybrid strategy to make motion estimates based on confidence scores. For high confidence score detections, TLtrack employs transformers to predict its locations. For low confidence score detections, a simple linear model that estimates locations through trajectory historical information is used. TLtrack can not only consider the historical information of the trajectory, but also analyze the latest movements. Our experimental results on the DanceTrack dataset show that our method achieves the best performance compared with other motion models.

引用

页码：938 / 947

页数：10

共 37 条

[1] Tracking without bells and whistles
Bergmann, Philipp
Meinhardt, Tim
Leal-Taixe, Laura
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 941 - 951
[2] Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics
Bernardin, Keni
Stiefelhagen, Rainer
[J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)
[3] Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
[4] Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, DOI 10.48550/ARXIV.2004.10934]
[5] Cao JK, 2023, Arxiv, DOI [arXiv:2203.14360, 10.48550/ARXIV.2203.14360]
[6] Carion N., 2020, P 16 EUR C COMP VIS, P213, DOI 10.1007/978-3-030-58452-8_13
[7] Chaabane M, 2021, Arxiv, DOI arXiv:2102.02267
[8] QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking
Fischer, Tobias
Huang, Thomas E.
Pang, Jiangmiao
Qiu, Linlu
Chen, Haofeng
Darrell, Trevor
Yu, Fisher
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15380 - 15393
[9] Ge Z, 2021, Arxiv, DOI [arXiv:2107.08430, DOI 10.48550/ARXIV.2107.08430]
[10] Fast R-CNN
Girshick, Ross
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448

← 1 2 3 4 →