MotionFormer: An Improved Transformer-Based Architecture for Multi-object Tracking

被引：0

作者：

Agrawal, Harshit ^{[1
]}

Halder, Agrya ^{[1
]}

Chattopadhyay, Pratik ^{[1
]}

机构：

[1] Indian Inst Technol BHU, Varanasi, Uttar Pradesh, India

来源：

COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III | 2024年 / 2011卷

关键词：

Multi Object Tracking; Computer Vision; Motion Prediction; MotionFormer; Kalman Filter;

D O I：

10.1007/978-3-031-58535-7_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-object tracking (MOT) is a crucial task in Computer Vision with numerous real-world applications. In recent years, Transformer-based models have shown promising results in MOT. However, existing methods still face challenges in scenarios involving short-term object occlusion, camera motion, and ambiguous detection especially in low frame-rate videos. In this work, we propose a novel variant of the TrackFormer model that addresses these limitations by integrating an online motion prediction module based on the Kalman Filter that incorporates the important temporal information present in the videos. The addition of the Kalman Filter helps the model study the tracked pedestrians' motion patterns and leverage them for effective association among targets across the video frames even in the case of short-term occlusions without adding much to the computational complexity of the overall framework. The proposed model is termed MotionFormer due to its inherent ability to utilize long-term motion information. Through extensive evaluations using popular tracking datasets, namely MOT-17 and MOT-20, we demonstrate the effectiveness of our approach over existing approaches. Results show that the MotionFormer provides reasonably good tracking accuracy with a much lesser identity switching rate as compared to the other models. Further, it significantly outperforms the base TrackFormer model in terms of tracking accuracy, F1-score, as well as identity switching rate for MOT-17 private and public and MOT-20 private data.

引用

页码：212 / 224

页数：13

共 33 条

[31] Global Tracking Transformers
Zhou, Xingyi
Yin, Tianwei
Koltun, Vladlen
Krahenbuhl, Philipp
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8761 - 8770
[32] Zhu T, 2022, IEEE Trans. Pattern Anal. Mach. Intell.
[33] Zhu XZ, 2021, Arxiv, DOI [arXiv:2010.04159, DOI 10.48550/ARXIV.2010.04159]

← 1 2 3 4 →