QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking

被引:30
作者
Fischer, Tobias [1 ]
Huang, Thomas E. [1 ]
Pang, Jiangmiao [2 ]
Qiu, Linlu [3 ]
Chen, Haofeng [4 ]
Darrell, Trevor [5 ]
Yu, Fisher [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Informat Technol & Elect Engn, CH-8092 Zurich, Switzerland
[2] Shanghai AI Lab, Shanghai 200030, Peoples R China
[3] MIT, Dept EECS, Cambridge, MA 02139 USA
[4] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA
[5] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA
关键词
Feature extraction; Training; Detectors; Object tracking; Head; Three-dimensional displays; Pipelines; Multiple object tracking; quasi-dense similarity learning; MULTITARGET;
D O I
10.1109/TPAMI.2023.3301975
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity learning has been recognized as a crucial step for object tracking. However, existing multiple object tracking methods only use sparse ground truth matching as the training objective, while ignoring the majority of the informative regions in images. In this paper, we present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contrastive learning. We combine this similarity learning with multiple existing object detectors to build Quasi-Dense Tracking (QDTrack), which does not require displacement regression or motion priors. We find that the resulting distinctive feature space admits a simple nearest neighbor search at inference time for object association. In addition, we show that our similarity learning scheme is not limited to video data, but can learn effective instance similarity even from static input, enabling a competitive tracking performance without training on videos or using tracking supervision. We conduct extensive experiments on a wide variety of popular MOT benchmarks. We find that, despite its simplicity, QDTrack rivals the performance of state-of-the-art tracking methods on all benchmarks and sets a new state-of-the-art on the large-scale BDD100K MOT benchmark, while introducing negligible computational overhead to the detector.
引用
收藏
页码:15380 / 15393
页数:14
相关论文
共 89 条
  • [1] Athar Ali, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12356), P158, DOI 10.1007/978-3-030-58621-8_10
  • [2] Bachman P, 2019, ADV NEUR IN, V32
  • [3] Bergmann P, 2019, Arxiv, DOI arXiv:1903.05625
  • [4] Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics
    Bernardin, Keni
    Stiefelhagen, Rainer
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)
  • [5] Fully-Convolutional Siamese Networks for Object Tracking
    Bertinetto, Luca
    Valmadre, Jack
    Henriques, Joao F.
    Vedaldi, Andrea
    Torr, Philip H. S.
    [J]. COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 850 - 865
  • [6] Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
  • [7] Bochinski Erik, 2017, 2017 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), DOI 10.1109/AVSS.2017.8078516
  • [8] Cao JK, 2023, Arxiv, DOI [arXiv:2203.14360, 10.48550/ARXIV.2203.14360]
  • [9] Chen T, 2020, Arxiv, DOI arXiv:2002.05709
  • [10] Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor
    Choi, Wongun
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 3029 - 3037