Adaptive Multi-Scale Transformer Tracker for Satellite Videos

被引:0
|
作者
Zhang, Xin [1 ]
Jiao, Licheng [1 ]
Li, Lingling [1 ]
Liu, Xu [1 ]
Liu, Fang [1 ]
Ma, Wenping [1 ]
Yang, Shuyuan [1 ]
机构
[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Sch Artificial Intelligence,Minist Educ China, Key Lab Intelligent Percept & Image Understanding,, Xian 710071, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Satellites; Target tracking; Videos; Video tracking; Computational modeling; Adaptive Transformer; multi-scale Transformer (MT); object regression; satellite video tracking; OBJECT TRACKING;
D O I
10.1109/TGRS.2024.3441038
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Satellite video tracking tasks are often characterized by blurred foreground boundaries in vast scenes, a wide range of targets varying in scale, and irregular changes in appearance. These challenges significantly impact the optimization of robust tracker performance. Therefore, it is imperative to extract diverse features with dynamic adaptive learning capabilities for the target being tracked in each sequence. In this article, we explore a novel adaptive multi-scale Transformer (MT) tracker for satellite videos to explore the potential spatiotemporal information of the target effectively. Specifically, a multi-scale spatial Transformer (MSST) is designed to leverage stage-by-stage spatial reduction and channel doubling, thereby enhancing the representation capabilities for the tracked target. In dynamic feature learning, an adaptive temporal Transformer (ATT) is then introduced based on multiple cross attentions, which analyzes the adaptive learning capacity for the dynamic target. It analyzes the weight proportion of different attentions automatically in the specific sequence through the learnable parameters. Finally, a multi-scale feature (MSF) regression module is crafted to improve the positioning accuracy of targets with low pixel counts in satellite scenes. This module accomplishes precise annotation of target boxes by effectively fusing features from diverse stages. We evaluate the proposed tracker performance on several public satellite datasets, including SatSOT, SV248S, and VISO. Experimental results show that the performance of our model can be comparable to the state-of-the-art trackers.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Effective and Robust: A Discriminative Temporal Learning Transformer for Satellite Videos
    Zhang, Xin
    Jiao, Licheng
    Li, Lingling
    Liu, Xu
    Liu, Fang
    Yang, Shuyuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [2] CSLT: Contourlet-Based Siamese Learning Tracker for Dim and Small Targets in Satellite Videos
    Wu, Yinan
    Jiao, Licheng
    Liu, Fang
    Pi, Zhaoliang
    Liu, Xu
    Li, Lingling
    Yang, Shuyuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [3] A Fast Adaptive Multi-Scale Kernel Correlation Filter Tracker for Rigid Object
    Zheng, Kaiyuan
    Zhang, Zhiyong
    Qiu, Changzhen
    SENSORS, 2022, 22 (20)
  • [4] Seismic Data Interpolation Based on Multi-Scale Transformer
    Guo, Yuanqi
    Fu, Lihua
    Li, Hongwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [5] Multi-TranResUnet: An Improved Transformer Network for Solving Multi-Scale Issues in Image Segmentation
    Kang, Yajing
    Cheng, Shuai
    Guo, Liang
    Zheng, Chao
    Zhao, Jizhuang
    IEEE ACCESS, 2024, 12 : 129000 - 129011
  • [6] Voxel-Based Multi-Scale Transformer Network for Event Stream Processing
    Liu, Daikun
    Wang, Teng
    Sun, Changyin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2112 - 2124
  • [7] An Improved Transformer Network With Multi-Scale Convolution for Weed Identification in Sugarcane Field
    Sun, Cuimin
    Zhang, Menghua
    Zhou, Muchen
    Zhou, Xingzhi
    IEEE ACCESS, 2024, 12 : 31168 - 31181
  • [8] EMSFomer: Efficient Multi-Scale Transformer for Real-Time Semantic Segmentation
    Xia, Zhengyu
    Kim, Joohee
    IEEE ACCESS, 2025, 13 : 18239 - 18252
  • [9] A Graph Association Motion-Aware Tracker for Tiny Object in Satellite Videos
    Huang, Zhongjian
    Jiao, Licheng
    Zhang, Jinyue
    Liu, Xu
    Liu, Fang
    Zhang, Xiangrong
    Li, Lingling
    Chen, Puhua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12907 - 12922
  • [10] Transformer-Based Multi-Scale Feature Remote Sensing Image Classification Model
    Sun, Ting
    Li, Jun
    Zhou, Xiangrui
    Chen, Zan
    IEEE ACCESS, 2025, 13 : 34095 - 34104