Joint spatio-temporal modeling for visual tracking

被引：6

作者：

Sun, Yumei ^{[1
,2
,3
,4
,5
]}

Tang, Chuanming ^{[1
,2
,3
,4
,5
]}

Luo, Hui ^{[1
,2
,3
,4
,5
]}

Li, Qingqing ^{[1
,2
,3
,5
]}

Peng, Xiaoming ^{[5
]}

Zhang, Jianlin ^{[1
,2
,3
,4
,5
]}

Li, Meihui ^{[1
,2
,3
,5
]}

Wei, Yuxing ^{[1
,2
,3
,5
]}

机构：

[1] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 108408, Peoples R China

[2] Chinese Acad Sci, Key Lab Opt Engn, Chengdu 610209, Peoples R China

[3] Chinese Acad Sci, Inst Opt & Elect, Chengdu 610209, Peoples R China

[4] Chinese Acad Sci, Natl Key Lab Opt Field Manipulat Sci & Technol, Chengdu 610209, Peoples R China

[5] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 611731, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 283卷

关键词：

Visual tracking; Siamese trackers; Sequence prediction; Spatio-temporal model;

D O I：

10.1016/j.knosys.2023.111206

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Similarity-based approaches have made significant progress in visual object tracking (VOT). Although these methods work well in simple scenes, they ignore the continuous spatio-temporal connection of the object in the video sequence. For this reason, tracking by spatial matching solely can lead to tracking failures because of distractors and occlusion. In this paper, we propose a spatio-temporal joint-modeling tracker named STTrack which implicitly builds continuous connections between the temporal and spatial aspects of the sequence. Specifically, we first design a time-sequence iteration strategy (TSIS) to concentrate on the temporal connection of the object in the video sequence. Then, we propose a novel spatial temporal interaction Transformer network (STIN) to capture the spatio-temporal correlation of the object between frames. The proposed STIN module is robust in object occlusion because it explores the dynamic state change dependencies of the object. Finally, we introduce a spatio-temporal query to suppress distractors by iteratively propagating the target prior. Extensive experiments on six tracking benchmark datasets demonstrate that the proposed STTrack achieves excellent performance while operating in real-time. The code is publicly available at https://github.com/nubsym/STTrack.

引用

页数：10

共 50 条

[1] Fully-Convolutional Siamese Networks for Object Tracking [J].

Bertinetto, Luca ;

Valmadre, Jack ;

Henriques, Joao F. ;

Vedaldi, Andrea ;

Torr, Philip H. S. .

COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865

[2] Know Your Surroundings: Exploiting Scene Information for Object Tracking [J].

Bhat, Goutam ;

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :205-221

[3]

Chen X., 2022, IEEE Trans. Pattern Anal. Mach. Intell., P1

[4] Transformer Tracking [J].

Chen, Xin ;

Yan, Bin ;

Zhu, Jiawen ;

Wang, Dong ;

Yang, Xiaoyun ;

Lu, Huchuan .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131

[5] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction [J].

Choi, Seokeon ;

Lee, Junhyun ;

Lee, Yunsung ;

Hauptmann, Alexander .

COMPUTER VISION - ECCV 2020 WORKSHOPS, PT V, 2020, 12539 :602-617

[6] High-Performance Long-Term Tracking with Meta-Updater [J].

Dai, Kenan ;

Zhang, Yunhua ;

Wang, Dong ;

Li, Jianhua ;

Lu, Huchuan ;

Yang, Xiaoyun .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6297-6306

[7] Probabilistic Regression for Visual Tracking [J].

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7181-7190

[8] ECO: Efficient Convolution Operators for Tracking [J].

Danelljan, Martin ;

Bhat, Goutam ;

Khan, Fahad Shahbaz ;

Felsberg, Michael .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6931-6939

[9] Correlation-Guided Attention for Corner Detection Based Visual Tracking [J].

Du, Fei ;

Liu, Peng ;

Zhao, Wei ;

Tang, Xianglong .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6835-6844

[10] Learning spatial variance-key surrounding-aware tracking via multi-expert deep feature fusion [J].

Elayaperumal, Dinesh ;

Joo, Young Hoon .

INFORMATION SCIENCES, 2023, 629 :502-519

← 1 2 3 4 5 →