MULTI-OBJECT TRACKING AS ATTENTION MECHANISM

被引:7
作者
Fukui, Hiroshi [1 ]
Miyagawa, Taiki [1 ]
Morishita, Yusuke [1 ]
机构
[1] NEC Corp Ltd, Tokyo, Japan
来源
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年
关键词
Multi-object tracking; Attention mechanism; Cross-attention mechanism; Real-time processing; End-to-end MOT model;
D O I
10.1109/ICIP49359.2023.10222207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a conceptually simple and thus fast multi-object tracking (MOT) model that does not require any attached modules, such as the Kalman filter, Hungarian algorithm, transformer blocks, or graph networks. Conventional MOT models are built upon the multi-step modules listed above, and thus the computational cost is high. Our proposed end-to-end MOT model, TicrossNet, is composed of a base detector and a cross-attention module only. As a result, the overhead of tracking does not increase significantly even when the number of instances (N-t) increases. We show that TicrossNet runs in real-time; specifically, it achieves 32.6 FPS on MOT17 and 31.0 FPS on MOT20 (Tesla V100), which includes as many as >100 instances per frame. We also demonstrate that TicrossNet is robust to N-t; thus, it does not have to change the size of the base detector, depending on N-t, as is often done by other models for real-time processing.
引用
收藏
页码:505 / 509
页数:5
相关论文
共 23 条
[1]  
Dendorfer P., 2020, arXiv, DOI [10.48550/arXiv.2003.09003, DOI 10.48550/ARXIV.2003.09003]
[2]   Pedestrian Detection: An Evaluation of the State of the Art [J].
Dollar, Piotr ;
Wojek, Christian ;
Schiele, Bernt ;
Perona, Pietro .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (04) :743-761
[3]   Robust Multiperson Tracking from a Mobile Platform [J].
Ess, Andreas ;
Leibe, Bastian ;
Schindler, Konrad ;
van Gool, Luc .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (10) :1831-1846
[4]  
Feng WT, 2019, Arxiv, DOI arXiv:1901.06129
[5]  
Ge Z, 2021, Arxiv, DOI [arXiv:2107.08430, DOI 10.48550/ARXIV.2107.08430, 10.48550/arXiv.2107.08430]
[6]  
Jinlong Peng, 2020, P IEEE C EUR C COMP
[7]  
Keni Bernardin, 2008, Image Video Process
[8]   Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving [J].
Li, Peixuan ;
Jin, Jieyu .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :3875-3884
[9]  
Long Chen, 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence
[10]  
Milan A, 2016, Arxiv, DOI [arXiv:1603.00831, 10.48550/arXiv.1603.00831]