Towards Real-Time Multi-Object Tracking

被引:599
作者
Wang, Zhongdao [1 ]
Zheng, Liang [2 ]
Liu, Yixuan [1 ]
Li, Yali [1 ]
Wang, Shengjin [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Australian Natl Univ, Canberra, Australia
来源
COMPUTER VISION - ECCV 2020, PT XI | 2020年 / 12356卷
关键词
Multi-Object Tracking;
D O I
10.1007/978-3-030-58621-8_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern multiple object tracking (MOT) systems usually follow the tracking-by-detection paradigm. It has 1) a detection model for target localization and 2) an appearance embedding model for data association. Having the two models separately executed might lead to efficiency problems, as the running time is simply a sum of the two steps without investigating potential structures that can be shared between them. Existing research efforts on real-time MOT usually focus on the association step, so they are essentially real-time association methods but not real-time MOT system. In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model. Specifically, we incorporate the appearance embedding model into a single-shot detector, such that the model can simultaneously output detections and the corresponding embeddings. We further propose a simple and fast association method that works in conjunction with the joint model. In both components the computation cost is significantly reduced compared with former MOT systems, resulting in a neat and fast baseline for future follow-ups on real-time MOT algorithm design. To our knowledge, this work reports the first (near) real-time MOT system, with a running speed of 22 to 40 FPS depending on the input resolution. Meanwhile, its tracking accuracy is comparable to the state-of-the-art trackers embodying separate detection and embedding (SDE) learning (64.4% MOTA v.s. 66.1% MOTA on MOT-16 challenge). Code and models are available at https://github.com/Zhongdao/Towards-Realtime-MOT.
引用
收藏
页码:107 / 122
页数:16
相关论文
共 49 条
[11]   Recurrent Autoregressive Networks for Online Multi-Object Tracking [J].
Fang, Kuan ;
Xiang, Yu ;
Li, Xiaocheng ;
Savarese, Silvio .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :466-475
[12]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[13]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[14]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[15]  
Hermans A, 2017, Arxiv, DOI arXiv:1703.07737
[16]  
Jiang H, 2007, PROC CVPR IEEE, P1604
[17]   Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics [J].
Kendall, Alex ;
Gal, Yarin ;
Cipolla, Roberto .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7482-7491
[18]   Multiple Hypothesis Tracking Revisited [J].
Kim, Chanho ;
Li, Fuxin ;
Ciptadi, Arridhana ;
Rehg, James M. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4696-4704
[19]   CornerNet: Detecting Objects as Paired Keypoints [J].
Law, Hei ;
Deng, Jia .
COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :765-781
[20]   Learning an image-based motion context for multiple people tracking [J].
Leal-Taixe, Laura ;
Fenzi, Michele ;
Kuznetsova, Alina ;
Rosenhahn, Bodo ;
Savarese, Silvio .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3542-3549