Self-supervised multi-object tracking based on metric learning

被引:1
作者
Feng, Xin [1 ]
Liu, Yan [2 ]
Yang, Hanzhi [1 ]
Jiao, Xiaoning [1 ]
Liu, Zhi [2 ]
机构
[1] Chongqing Univ Technol, Coll Comp Sci & Engn, Chongqing 404100, Peoples R China
[2] Chongqing Univ Technol, Sch Artificial Intelligence, Chongqing 404100, Peoples R China
基金
英国科研创新办公室;
关键词
Multi-object tracking; Joint Detection and Embedding; self-supervised learning; Metric Learning;
D O I
10.1007/s40747-024-01475-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current paradigm of joint detection and tracking still requires a large amount of instance-level trajectory annotation, which incurs high annotation costs. Moreover, treating embedding training as a classification problem would lead to difficulties in model fitting. In this paper, we propose a new self-supervised multi-object tracking based on the real-time joint detection and embedding (JDE) framework, which we termed as self-supervised multi-object tracking (SS-MOT). In SS-MOT, the short-term temporal correlations between objects within and across adjacent video frames are both considered as self-supervised constraints, where the distances between different objects are enlarged while the distances between same object of adjacent frames are brought closer. In addition, short trajectories are formed by matching pairs of adjacent frames using a matching algorithm, and these matched pairs are treated as positive samples. The distances between positive samples are then minimized for futher the feature representation of the same object. Therefore, our method can be trained on videos without instance-level annotations. We apply our approach to state-of-the-art JDE models, such as FairMOT, Cstrack, and SiamMOT, and achieve comparable results to these supevised methods on the widely used MOT17 and MOT20 challenges.
引用
收藏
页码:7077 / 7088
页数:12
相关论文
共 22 条
  • [1] Chen Ting, 2019, P INT C MACH LEARN
  • [2] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [3] Du Yunhao, 2023, IEEE Transactions on Multimedia
  • [4] Jinlong Peng, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12349), P145, DOI 10.1007/978-3-030-58548-8_9
  • [5] Analysis Based on Recent Deep Learning Approaches Applied in Real-Time Multi-Object Tracking: A Review
    Kalake, Lesole
    Wan, Wanggen
    Hou, Li
    [J]. IEEE ACCESS, 2021, 9 : 32650 - 32671
  • [6] Laura L-T, 2015, ARXIV
  • [7] CornerNet: Detecting Objects as Paired Keypoints
    Law, Hei
    Deng, Jia
    [J]. COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 : 765 - 781
  • [8] Li Wei, 2021, ARXIV
  • [9] Rethinking the Competition Between Detection and ReID in Multiobject Tracking
    Liang, Chao
    Zhang, Zhipeng
    Zhou, Xue
    Li, Bing
    Zhu, Shuyuan
    Hu, Weiming
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3182 - 3196
  • [10] Multiple object tracking: A literature review
    Luo, Wenhan
    Xing, Junliang
    Milan, Anton
    Zhang, Xiaoqin
    Liu, Wei
    Kim, Tae-Kyun
    [J]. ARTIFICIAL INTELLIGENCE, 2021, 293