Leveraging temporal-aware fine-grained features for robust multiple object tracking

被引:6
作者
Wu, Han [1 ]
Nie, Jiahao [1 ]
Zhu, Ziming [1 ]
He, Zhiwei [1 ,2 ]
Gao, Mingyu [1 ,2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Elect Informat, Hangzhou 310018, Zhejiang, Peoples R China
[2] Zhejiang Prov Key Lab Equipment Elect, 2019E10009, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Multiple object tracking; Tracking-by-detection; Critical feature capturing; Temporal-aware feature aggregation;
D O I
10.1007/s11227-022-04776-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Existing multi-object trackers mainly apply the tracking-by-detection (TBD) paradigm and have achieved remarkable success. However, the mainstream methods execute their detection networks alone, without taking full advantage of the information derived from tracking so that the detection and tracking processes can benefit from each other. In this paper, we achieve strengthened tracking performance in complex scenarios by utilizing the rich temporal information derived from the tracking process to enhance the critical features at the current moment. Specifically, we first propose a critical feature capturing network (CFCN) for extracting receptive field adaptive discriminative features for each frame. Then, we design a temporal-aware feature aggregation module (TFAM), which is used to propagate previous critical features, thus leveraging temporal information to alleviate the detection quality degradation encountered when the visual cues decrease. Extensive experimental comparisons and analyses demonstrate the superiority and effectiveness of the proposed method on the popular and challenging MOT16, MOT17, and MOT20 benchmarks. The experimental results reveal that our tracker achieves state-of-the-art tracking performance, e.g., IDF1 of 75.2% on IDF and MOTA of 80.4% on MOT17.
引用
收藏
页码:2910 / 2931
页数:22
相关论文
共 68 条
[1]  
Anton M, 2016, PREPRINT
[2]  
Bernardin K, 2016, EURASIP J IMAGE VIDE, P17
[3]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[4]  
Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
[5]  
Chen LJ, 2018, 2018 3RD INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY (ICSRS), P1, DOI [10.1109/ICSRS.2018.8688869, 10.1109/ICSRS.2018.00009]
[6]   Beyond triplet loss: a deep quadruplet network for person re-identification [J].
Chen, Weihua ;
Chen, Xiaotang ;
Zhang, Jianguo ;
Huang, Kaiqi .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1320-1329
[7]   Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism [J].
Chu, Qi ;
Ouyang, Wanli ;
Li, Hongsheng ;
Wang, Xiaogang ;
Liu, Bin ;
Yu, Nenghai .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4846-4855
[8]   Deep learning in video multi-object tracking: A survey [J].
Ciaparrone, Gioele ;
Luque Sanchez, Francisco ;
Tabik, Siham ;
Troiano, Luigi ;
Tagliaferri, Roberto ;
Herrera, Francisco .
NEUROCOMPUTING, 2020, 381 :61-88
[9]   MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking [J].
Dendorfer, Patrick ;
Osep, Aljosa ;
Milan, Anton ;
Schindler, Konrad ;
Cremers, Daniel ;
Reid, Ian ;
Roth, Stefan ;
Leal-Taixe, Laura .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (04) :845-881
[10]  
Dollár P, 2009, PROC CVPR IEEE, P304, DOI 10.1109/CVPRW.2009.5206631