MotionFormer: An Improved Transformer-Based Architecture for Multi-object Tracking

被引:0
作者
Agrawal, Harshit [1 ]
Halder, Agrya [1 ]
Chattopadhyay, Pratik [1 ]
机构
[1] Indian Inst Technol BHU, Varanasi, Uttar Pradesh, India
来源
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III | 2024年 / 2011卷
关键词
Multi Object Tracking; Computer Vision; Motion Prediction; MotionFormer; Kalman Filter;
D O I
10.1007/978-3-031-58535-7_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-object tracking (MOT) is a crucial task in Computer Vision with numerous real-world applications. In recent years, Transformer-based models have shown promising results in MOT. However, existing methods still face challenges in scenarios involving short-term object occlusion, camera motion, and ambiguous detection especially in low frame-rate videos. In this work, we propose a novel variant of the TrackFormer model that addresses these limitations by integrating an online motion prediction module based on the Kalman Filter that incorporates the important temporal information present in the videos. The addition of the Kalman Filter helps the model study the tracked pedestrians' motion patterns and leverage them for effective association among targets across the video frames even in the case of short-term occlusions without adding much to the computational complexity of the overall framework. The proposed model is termed MotionFormer due to its inherent ability to utilize long-term motion information. Through extensive evaluations using popular tracking datasets, namely MOT-17 and MOT-20, we demonstrate the effectiveness of our approach over existing approaches. Results show that the MotionFormer provides reasonably good tracking accuracy with a much lesser identity switching rate as compared to the other models. Further, it significantly outperforms the base TrackFormer model in terms of tracking accuracy, F1-score, as well as identity switching rate for MOT-17 private and public and MOT-20 private data.
引用
收藏
页码:212 / 224
页数:13
相关论文
共 33 条
  • [1] MeMOT: Multi-Object Tracking with Memory
    Cai, Jiarui
    Xu, Mingze
    Li, Wei
    Xiong, Yuanjun
    Xia, Wei
    Tu, Zhuowen
    Soatto, Stefano
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8080 - 8090
  • [2] TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking
    Chu, Peng
    Wang, Jiang
    You, Quanzeng
    Ling, Haibin
    Liu, Zicheng
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4859 - 4869
  • [3] Dendorfer P., 2020, arXiv, DOI DOI 10.48550/ARXIV.2003.09003
  • [4] Dubuisson B., 1973, J. Basic Eng, V83, P95, DOI DOI 10.1115/1.3658902
  • [5] Galor A, 2024, Arxiv, DOI arXiv:2210.13570
  • [6] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [7] Transformers in Vision: A Survey
    Khan, Salman
    Naseer, Muzammal
    Hayat, Munawar
    Zamir, Syed Waqas
    Khan, Fahad Shahbaz
    Shah, Mubarak
    [J]. ACM COMPUTING SURVEYS, 2022, 54 (10S)
  • [8] MODELING HUMAN MEMORY IN MULTI-OBJECT TRACKING WITH TRANSFORMERS
    Li, Yizhuo
    Lu, Cewu
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2849 - 2853
  • [9] SegDQ: Segmentation assisted multi-object tracking with dynamic query-based transformers
    Liu, Yating
    Bai, Tianxiang
    Tian, Yonglin
    Wang, Yutong
    Wang, Jiangong
    Wang, Xiao
    Wang, Fei-Yue
    [J]. NEUROCOMPUTING, 2022, 481 : 91 - 101
  • [10] TrackFormer: Multi-Object Tracking with Transformers
    Meinhardt, Tim
    Kirillov, Alexander
    Leal-Taixe, Laura
    Feichtenhofer, Christoph
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8834 - 8844