MotionFormer: An Improved Transformer-Based Architecture for Multi-object Tracking

被引：0

作者：

Agrawal, Harshit ^{[1
]}

Halder, Agrya ^{[1
]}

Chattopadhyay, Pratik ^{[1
]}

机构：

[1] Indian Inst Technol BHU, Varanasi, Uttar Pradesh, India

来源：

COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III | 2024年 / 2011卷

关键词：

Multi Object Tracking; Computer Vision; Motion Prediction; MotionFormer; Kalman Filter;

D O I：

10.1007/978-3-031-58535-7_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-object tracking (MOT) is a crucial task in Computer Vision with numerous real-world applications. In recent years, Transformer-based models have shown promising results in MOT. However, existing methods still face challenges in scenarios involving short-term object occlusion, camera motion, and ambiguous detection especially in low frame-rate videos. In this work, we propose a novel variant of the TrackFormer model that addresses these limitations by integrating an online motion prediction module based on the Kalman Filter that incorporates the important temporal information present in the videos. The addition of the Kalman Filter helps the model study the tracked pedestrians' motion patterns and leverage them for effective association among targets across the video frames even in the case of short-term occlusions without adding much to the computational complexity of the overall framework. The proposed model is termed MotionFormer due to its inherent ability to utilize long-term motion information. Through extensive evaluations using popular tracking datasets, namely MOT-17 and MOT-20, we demonstrate the effectiveness of our approach over existing approaches. Results show that the MotionFormer provides reasonably good tracking accuracy with a much lesser identity switching rate as compared to the other models. Further, it significantly outperforms the base TrackFormer model in terms of tracking accuracy, F1-score, as well as identity switching rate for MOT-17 private and public and MOT-20 private data.

引用

页码：212 / 224

页数：13

共 33 条

[1] MeMOT: Multi-Object Tracking with Memory
Cai, Jiarui
Xu, Mingze
Li, Wei
Xiong, Yuanjun
Xia, Wei
Tu, Zhuowen
Soatto, Stefano
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8080 - 8090
[2] TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking
Chu, Peng
Wang, Jiang
You, Quanzeng
Ling, Haibin
Liu, Zicheng
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4859 - 4869
[3] Dendorfer P., 2020, arXiv, DOI DOI 10.48550/ARXIV.2003.09003
[4] Dubuisson B., 1973, J. Basic Eng, V83, P95, DOI DOI 10.1115/1.3658902
[5] Galor A, 2024, Arxiv, DOI arXiv:2210.13570
[6] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[7] Transformers in Vision: A Survey
Khan, Salman
Naseer, Muzammal
Hayat, Munawar
Zamir, Syed Waqas
Khan, Fahad Shahbaz
Shah, Mubarak
[J]. ACM COMPUTING SURVEYS, 2022, 54 (10S)
[8] MODELING HUMAN MEMORY IN MULTI-OBJECT TRACKING WITH TRANSFORMERS
Li, Yizhuo
Lu, Cewu
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2849 - 2853
[9] SegDQ: Segmentation assisted multi-object tracking with dynamic query-based transformers
Liu, Yating
Bai, Tianxiang
Tian, Yonglin
Wang, Yutong
Wang, Jiangong
Wang, Xiao
Wang, Fei-Yue
[J]. NEUROCOMPUTING, 2022, 481 : 91 - 101
[10] TrackFormer: Multi-Object Tracking with Transformers
Meinhardt, Tim
Kirillov, Alexander
Leal-Taixe, Laura
Feichtenhofer, Christoph
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8834 - 8844

← 1 2 3 4 →