MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking

被引:18
作者
Gao, Ruopeng [1 ]
Wang, Limin [1 ,2 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Shanghai AI Lab, Shanghai, Peoples R China
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.00908
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a video task, Multiple Object Tracking (MOT) is expected to capture temporal information of targets effectively. Unfortunately, most existing methods only explicitly exploit the object features between adjacent frames, while lacking the capacity to model long-term temporal information. In this paper, we propose MeMOTR, a long-term memory-augmented Transformer for multi-object tracking. Our method is able to make the same object's track embedding more stable and distinguishable by leveraging longterm memory injection with a customized memory-attention layer. This significantly improves the target association ability of our model. Experimental results on DanceTrack show that MeMOTR impressively surpasses the state-of-theart method by 7.9% and 13.0% on HOTA and AssA metrics, respectively. Furthermore, our model also outperforms other Transformer-based methods on association performance on MOT17 and generalizes well on BDD100K. Code is available at https://github.com/MCG-NJU/MeMOTR.
引用
收藏
页码:9867 / 9876
页数:10
相关论文
共 48 条
  • [1] [Anonymous], 2022, CVPR, DOI DOI 10.1109/CVPR52688.2022.02032
  • [2] [Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01217
  • [3] [Anonymous], 2016 IEEE INT C IMAG, DOI [DOI 10.1109/ICIP.2016.7533003, 10.1109/ICIP.2016.7533003]
  • [4] [Anonymous], 2023, WACV, DOI DOI 10.1109/WACV56688.2023.00478
  • [5] [Anonymous], 2018, CVPR WORKSH, DOI DOI 10.1109/CVPRW.2018.00223
  • [6] Tracking without bells and whistles
    Bergmann, Philipp
    Meinhardt, Tim
    Leal-Taixe, Laura
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 941 - 951
  • [7] Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics
    Bernardin, Keni
    Stiefelhagen, Rainer
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)
  • [8] MeMOT: Multi-Object Tracking with Memory
    Cai, Jiarui
    Xu, Mingze
    Li, Wei
    Xiong, Yuanjun
    Xia, Wei
    Tu, Zhuowen
    Soatto, Stefano
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8080 - 8090
  • [9] Cao Jinkun, 2022, ABS220314360 CORR
  • [10] Carion N., 2020, P EUR C COMP VIS GLA, P213, DOI DOI 10.1007/978-3-030-58452-813