Transformer-based two-source motion model for multi-object tracking

被引：10

作者：

Yang, Jieming ^{[1
,2
]}

Ge, Hongwei ^{[1
,2
]}

Su, Shuzhi ^{[3
]}

Liu, Guoqing ^{[1
,2
]}

机构：

[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Jiangsu, Peoples R China

[2] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Jiangsu, Peoples R China

[3] Anhui Univ Sci & Technol, Sch Comp Sci & Engn, Huainan 232001, Anhui, Peoples R China

来源：

APPLIED INTELLIGENCE | 2022年 / 52卷 / 09期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Deep learning; Neural network; Computer vision; Multi-object tracking; Motion model;

D O I：

10.1007/s10489-021-03012-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, benefit from the development of detection models, the multi-object tracking method based on tracking-by-detection has greatly improved performance. However, most methods still utilize traditional motion models for position prediction, such as the constant velocity model and Kalman filter. Only a few methods adopt deep network-based methods for prediction. Still, these methods only exploit the simplest RNN(Recurrent Neural Network) to predict the position, and the position offset caused by the camera movement is not considered. Therefore, inspired by the outstanding performance of Transformer in temporal tasks, this paper proposes a Transformer-based motion model for multi-object tracking. By taking the historical position difference of the target and the offset vector between consecutive frames as input, the model considers the motion of the target itself and the camera at the same time, which improves the prediction accuracy of the motion model used in the multi-target tracking method, thereby improving tracking performance. Through comparative experiments and tracking results on MOTchallenge benchmarks, the effectiveness of the proposed method is proved.

引用

页码：9967 / 9979

页数：13

共 42 条

[1] Social LSTM: Human Trajectory Prediction in Crowded Spaces
Alahi, Alexandre
Goel, Kratarth
Ramanathan, Vignesh
Robicquet, Alexandre
Li Fei-Fei
Savarese, Silvio
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 961 - 971
[2] Andriyenko A, 2011, PROC CVPR IEEE, P1265, DOI 10.1109/CVPR.2011.5995311
[3] A dual CNN-RNN for multiple people tracking
Babaee, Maryam
Li, Zimu
Rigoll, Gerhard
[J]. NEUROCOMPUTING, 2019, 368 : 69 - 83
[4] Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning
Bae, Seung-Hwan
Yoon, Kuk-Jin
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1218 - 1225
[5] Tracking without bells and whistles
Bergmann, Philipp
Meinhardt, Tim
Leal-Taixe, Laura
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 941 - 951
[6] Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics
Bernardin, Keni
Stiefelhagen, Rainer
[J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)
[7] Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
[8] Bochinski Erik, 2017, 2017 14th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), DOI 10.1109/AVSS.2017.8078516
[9] Carion N., 2020, EUROPEAN C COMPUTER, V12346, P213, DOI 10.1007/978-3-030-58452-8_13
[10] Chen M, 2020, PR MACH LEARN RES, V119

← 1 2 3 4 5 →