Transformer-based two-source motion model for multi-object tracking

被引:10
作者
Yang, Jieming [1 ,2 ]
Ge, Hongwei [1 ,2 ]
Su, Shuzhi [3 ]
Liu, Guoqing [1 ,2 ]
机构
[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Jiangsu, Peoples R China
[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Jiangsu, Peoples R China
[3] Anhui Univ Sci & Technol, Sch Comp Sci & Engn, Huainan 232001, Anhui, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Deep learning; Neural network; Computer vision; Multi-object tracking; Motion model;
D O I
10.1007/s10489-021-03012-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, benefit from the development of detection models, the multi-object tracking method based on tracking-by-detection has greatly improved performance. However, most methods still utilize traditional motion models for position prediction, such as the constant velocity model and Kalman filter. Only a few methods adopt deep network-based methods for prediction. Still, these methods only exploit the simplest RNN(Recurrent Neural Network) to predict the position, and the position offset caused by the camera movement is not considered. Therefore, inspired by the outstanding performance of Transformer in temporal tasks, this paper proposes a Transformer-based motion model for multi-object tracking. By taking the historical position difference of the target and the offset vector between consecutive frames as input, the model considers the motion of the target itself and the camera at the same time, which improves the prediction accuracy of the motion model used in the multi-target tracking method, thereby improving tracking performance. Through comparative experiments and tracking results on MOTchallenge benchmarks, the effectiveness of the proposed method is proved.
引用
收藏
页码:9967 / 9979
页数:13
相关论文
共 42 条
  • [1] Social LSTM: Human Trajectory Prediction in Crowded Spaces
    Alahi, Alexandre
    Goel, Kratarth
    Ramanathan, Vignesh
    Robicquet, Alexandre
    Li Fei-Fei
    Savarese, Silvio
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 961 - 971
  • [2] Andriyenko A, 2011, PROC CVPR IEEE, P1265, DOI 10.1109/CVPR.2011.5995311
  • [3] A dual CNN-RNN for multiple people tracking
    Babaee, Maryam
    Li, Zimu
    Rigoll, Gerhard
    [J]. NEUROCOMPUTING, 2019, 368 : 69 - 83
  • [4] Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning
    Bae, Seung-Hwan
    Yoon, Kuk-Jin
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1218 - 1225
  • [5] Tracking without bells and whistles
    Bergmann, Philipp
    Meinhardt, Tim
    Leal-Taixe, Laura
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 941 - 951
  • [6] Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics
    Bernardin, Keni
    Stiefelhagen, Rainer
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)
  • [7] Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
  • [8] Bochinski Erik, 2017, 2017 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), DOI 10.1109/AVSS.2017.8078516
  • [9] Carion N., 2020, EUROPEAN C COMPUTER, V12346, P213, DOI 10.1007/978-3-030-58452-8_13
  • [10] Chen M, 2020, PR MACH LEARN RES, V119