Adaptive Multi-Scale Transformer Tracker for Satellite Videos

被引：0

作者：

Zhang, Xin ^{[1
]}

Jiao, Licheng ^{[1
]}

Li, Lingling ^{[1
]}

Liu, Xu ^{[1
]}

Liu, Fang ^{[1
]}

Ma, Wenping ^{[1
]}

Yang, Shuyuan ^{[1
]}

机构：

[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Sch Artificial Intelligence,Minist Educ China, Key Lab Intelligent Percept & Image Understanding,, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Transformers; Satellites; Target tracking; Videos; Video tracking; Computational modeling; Adaptive Transformer; multi-scale Transformer (MT); object regression; satellite video tracking; OBJECT TRACKING;

D O I：

10.1109/TGRS.2024.3441038

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Satellite video tracking tasks are often characterized by blurred foreground boundaries in vast scenes, a wide range of targets varying in scale, and irregular changes in appearance. These challenges significantly impact the optimization of robust tracker performance. Therefore, it is imperative to extract diverse features with dynamic adaptive learning capabilities for the target being tracked in each sequence. In this article, we explore a novel adaptive multi-scale Transformer (MT) tracker for satellite videos to explore the potential spatiotemporal information of the target effectively. Specifically, a multi-scale spatial Transformer (MSST) is designed to leverage stage-by-stage spatial reduction and channel doubling, thereby enhancing the representation capabilities for the tracked target. In dynamic feature learning, an adaptive temporal Transformer (ATT) is then introduced based on multiple cross attentions, which analyzes the adaptive learning capacity for the dynamic target. It analyzes the weight proportion of different attentions automatically in the specific sequence through the learnable parameters. Finally, a multi-scale feature (MSF) regression module is crafted to improve the positioning accuracy of targets with low pixel counts in satellite scenes. This module accomplishes precise annotation of target boxes by effectively fusing features from diverse stages. We evaluate the proposed tracker performance on several public satellite datasets, including SatSOT, SV248S, and VISO. Experimental results show that the performance of our model can be comparable to the state-of-the-art trackers.

引用

页数：16

共 50 条

[1] Effective and Robust: A Discriminative Temporal Learning Transformer for Satellite Videos
Zhang, Xin
Jiao, Licheng
Li, Lingling
Liu, Xu
Liu, Fang
Yang, Shuyuan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[2] CSLT: Contourlet-Based Siamese Learning Tracker for Dim and Small Targets in Satellite Videos
Wu, Yinan
Jiao, Licheng
Liu, Fang
Pi, Zhaoliang
Liu, Xu
Li, Lingling
Yang, Shuyuan
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[3] A Fast Adaptive Multi-Scale Kernel Correlation Filter Tracker for Rigid Object
Zheng, Kaiyuan
Zhang, Zhiyong
Qiu, Changzhen
SENSORS, 2022, 22 (20)
[4] Seismic Data Interpolation Based on Multi-Scale Transformer
Guo, Yuanqi
Fu, Lihua
Li, Hongwei
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
[5] Multi-TranResUnet: An Improved Transformer Network for Solving Multi-Scale Issues in Image Segmentation
Kang, Yajing
Cheng, Shuai
Guo, Liang
Zheng, Chao
Zhao, Jizhuang
IEEE ACCESS, 2024, 12 : 129000 - 129011
[6] Voxel-Based Multi-Scale Transformer Network for Event Stream Processing
Liu, Daikun
Wang, Teng
Sun, Changyin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2112 - 2124
[7] An Improved Transformer Network With Multi-Scale Convolution for Weed Identification in Sugarcane Field
Sun, Cuimin
Zhang, Menghua
Zhou, Muchen
Zhou, Xingzhi
IEEE ACCESS, 2024, 12 : 31168 - 31181
[8] EMSFomer: Efficient Multi-Scale Transformer for Real-Time Semantic Segmentation
Xia, Zhengyu
Kim, Joohee
IEEE ACCESS, 2025, 13 : 18239 - 18252
[9] A Graph Association Motion-Aware Tracker for Tiny Object in Satellite Videos
Huang, Zhongjian
Jiao, Licheng
Zhang, Jinyue
Liu, Xu
Liu, Fang
Zhang, Xiangrong
Li, Lingling
Chen, Puhua
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12907 - 12922
[10] Transformer-Based Multi-Scale Feature Remote Sensing Image Classification Model
Sun, Ting
Li, Jun
Zhou, Xiangrui
Chen, Zan
IEEE ACCESS, 2025, 13 : 34095 - 34104

← 1 2 3 4 5 →