Adaptive Multi-Scale Transformer Tracker for Satellite Videos

被引：0

作者：

Zhang, Xin ^{[1
]}

Jiao, Licheng ^{[1
]}

Li, Lingling ^{[1
]}

Liu, Xu ^{[1
]}

Liu, Fang ^{[1
]}

Ma, Wenping ^{[1
]}

Yang, Shuyuan ^{[1
]}

机构：

[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Sch Artificial Intelligence,Minist Educ China, Key Lab Intelligent Percept & Image Understanding,, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Transformers; Satellites; Target tracking; Videos; Video tracking; Computational modeling; Adaptive Transformer; multi-scale Transformer (MT); object regression; satellite video tracking; OBJECT TRACKING;

D O I：

10.1109/TGRS.2024.3441038

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Satellite video tracking tasks are often characterized by blurred foreground boundaries in vast scenes, a wide range of targets varying in scale, and irregular changes in appearance. These challenges significantly impact the optimization of robust tracker performance. Therefore, it is imperative to extract diverse features with dynamic adaptive learning capabilities for the target being tracked in each sequence. In this article, we explore a novel adaptive multi-scale Transformer (MT) tracker for satellite videos to explore the potential spatiotemporal information of the target effectively. Specifically, a multi-scale spatial Transformer (MSST) is designed to leverage stage-by-stage spatial reduction and channel doubling, thereby enhancing the representation capabilities for the tracked target. In dynamic feature learning, an adaptive temporal Transformer (ATT) is then introduced based on multiple cross attentions, which analyzes the adaptive learning capacity for the dynamic target. It analyzes the weight proportion of different attentions automatically in the specific sequence through the learnable parameters. Finally, a multi-scale feature (MSF) regression module is crafted to improve the positioning accuracy of targets with low pixel counts in satellite scenes. This module accomplishes precise annotation of target boxes by effectively fusing features from diverse stages. We evaluate the proposed tracker performance on several public satellite datasets, including SatSOT, SV248S, and VISO. Experimental results show that the performance of our model can be comparable to the state-of-the-art trackers.

引用

页数：16

共 50 条

[41] MSVTNet: Multi-Scale Vision Transformer Neural Network for EEG-Based Motor Imagery Decoding
Liu, Ke
Yang, Tao
Yu, Zhuliang
Yi, Weibo
Yu, Hong
Wang, Guoyin
Wu, Wei
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (12) : 7126 - 7137
[42] MV-MS-FETE: Multi-view multi-scale feature extractor and transformer encoder for stenosis recognition in echocardiograms
Avola, Danilo
Cannistraci, Irene
Cascio, Marco
Cinque, Luigi
Fagioli, Alessio
Foresti, Gian Luca
Rodola, Emanuele
Solito, Luciana
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 245
[43] CFTracker: Multi-Object Tracking With Cross-Frame Connections in Satellite Videos
Kong, Lingyu
Yan, Zhiyuan
Zhang, Yidan
Diao, Wenhui
Zhu, Zining
Wang, Lei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[44] MULTI-FEATURES INTEGRATION BASED HYPERSPECTRAL VIDEOS TRACKER
Zhang, Zhe
Qian, Kun
Du, Juan
Zhou, Huixin
2021 11TH WORKSHOP ON HYPERSPECTRAL IMAGING AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2021,
[45] Multi-part and scale adaptive visual tracker based on kernel correlation filter
Luo, Mingqi
Zhou, Bin
Wang, Tuo
PLOS ONE, 2020, 15 (04):
[46] Fine-Grained Modulation Classification Using Multi-Scale Radio Transformer With Dual-Channel Representation
Zheng, Qinghe
Zhao, Penghui
Wang, Hongjun
Elhanashi, Abdussalam
Saponara, Sergio
IEEE COMMUNICATIONS LETTERS, 2022, 26 (06) : 1298 - 1302
[47] Multi-Scale Tokens-Aware Transformer Network for Multi-Region and Multi-Sequence MR-to-CT Synthesis in a Single Model
Zhong, Liming
Chen, Zeli
Shu, Hai
Zheng, Kaiyi
Li, Yin
Chen, Weicui
Wu, Yuankui
Ma, Jianhua
Feng, Qianjin
Yang, Wei
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 794 - 806
[48] Matching Multi-Scale Feature Sets in Vision Transformer for Few-Shot Classification
Song, Mingchen
Yao, Fengqin
Zhong, Guoqiang
Ji, Zhong
Zhang, Xiaowei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12638 - 12651
[49] Dual-Path Multi-Scale Transformer for High-Quality Image Deraining
Zhou, Huiling
Chen, Hongming
Wu, Xianhao
Li, Yufeng
2024 IEEE 26TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, MMSP, 2024,
[50] Research on Multi-Scale CNN and Transformer-Based Multi-Level Multi-Classification Method for Images
Gou, Quandeng
Ren, Yuheng
IEEE ACCESS, 2024, 12 : 103049 - 103059

← 1 2 3 4 5 →