Joint spatio-temporal modeling for visual tracking

被引:5
作者
Sun, Yumei [1 ,2 ,3 ,4 ,5 ]
Tang, Chuanming [1 ,2 ,3 ,4 ,5 ]
Luo, Hui [1 ,2 ,3 ,4 ,5 ]
Li, Qingqing [1 ,2 ,3 ,5 ]
Peng, Xiaoming [5 ]
Zhang, Jianlin [1 ,2 ,3 ,4 ,5 ]
Li, Meihui [1 ,2 ,3 ,5 ]
Wei, Yuxing [1 ,2 ,3 ,5 ]
机构
[1] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 108408, Peoples R China
[2] Chinese Acad Sci, Key Lab Opt Engn, Chengdu 610209, Peoples R China
[3] Chinese Acad Sci, Inst Opt & Elect, Chengdu 610209, Peoples R China
[4] Chinese Acad Sci, Natl Key Lab Opt Field Manipulat Sci & Technol, Chengdu 610209, Peoples R China
[5] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 611731, Peoples R China
关键词
Visual tracking; Siamese trackers; Sequence prediction; Spatio-temporal model;
D O I
10.1016/j.knosys.2023.111206
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity-based approaches have made significant progress in visual object tracking (VOT). Although these methods work well in simple scenes, they ignore the continuous spatio-temporal connection of the object in the video sequence. For this reason, tracking by spatial matching solely can lead to tracking failures because of distractors and occlusion. In this paper, we propose a spatio-temporal joint-modeling tracker named STTrack which implicitly builds continuous connections between the temporal and spatial aspects of the sequence. Specifically, we first design a time-sequence iteration strategy (TSIS) to concentrate on the temporal connection of the object in the video sequence. Then, we propose a novel spatial temporal interaction Transformer network (STIN) to capture the spatio-temporal correlation of the object between frames. The proposed STIN module is robust in object occlusion because it explores the dynamic state change dependencies of the object. Finally, we introduce a spatio-temporal query to suppress distractors by iteratively propagating the target prior. Extensive experiments on six tracking benchmark datasets demonstrate that the proposed STTrack achieves excellent performance while operating in real-time. The code is publicly available at https://github.com/nubsym/STTrack.
引用
收藏
页数:10
相关论文
共 50 条
[11]   LaSOT: A High-quality Large-scale Single Object Tracking Benchmark [J].
Fan, Heng ;
Bai, Hexin ;
Lin, Liting ;
Yang, Fan ;
Chu, Peng ;
Deng, Ge ;
Yu, Sijia ;
Harshit ;
Huang, Mingzhen ;
Liu, Juehuan ;
Xu, Yong ;
Liao, Chunyuan ;
Yuan, Lin ;
Ling, Haibin .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (02) :439-461
[12]   LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking [J].
Fan, Heng ;
Lin, Liting ;
Yang, Fan ;
Chu, Peng ;
Deng, Ge ;
Yu, Sijia ;
Bai, Hexin ;
Xu, Yong ;
Liao, Chunyuan ;
Ling, Haibin .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5369-5378
[13]  
Fu Z., 2022, IJCAI
[14]   STMTrack: Template-free Visual Tracking with Space-time Memory Networks [J].
Fu, Zhihong ;
Liu, Qingjie ;
Fu, Zehua ;
Wang, Yunhong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13769-13778
[15]   Visual object tracking via non-local correlation attention learning [J].
Gao, Long ;
Liu, Pan ;
Ning, Jifeng ;
Li, Yunsong .
KNOWLEDGE-BASED SYSTEMS, 2022, 254
[16]   AiATrack: Attention in Attention for Transformer Visual Tracking [J].
Gao, Shenyuan ;
Zhou, Chunluan ;
Ma, Chao ;
Wang, Xinggang ;
Yuan, Junsong .
COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 :146-164
[17]   Graph Attention Tracking [J].
Guo, Dongyan ;
Shao, Yanyan ;
Cui, Ying ;
Wang, Zhenhua ;
Zhang, Liyan ;
Shen, Chunhua .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9538-9547
[18]   SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking [J].
Guo, Dongyan ;
Wang, Jun ;
Cui, Ying ;
Wang, Zhenhua ;
Chen, Shengyong .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6268-6276
[19]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[20]   GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild [J].
Huang, Lianghua ;
Zhao, Xin ;
Huang, Kaiqi .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) :1562-1577