Adaptive Multi-Scale Transformer Tracker for Satellite Videos

被引:0
|
作者
Zhang, Xin [1 ]
Jiao, Licheng [1 ]
Li, Lingling [1 ]
Liu, Xu [1 ]
Liu, Fang [1 ]
Ma, Wenping [1 ]
Yang, Shuyuan [1 ]
机构
[1] Xidian Univ, Int Res Ctr Intelligent Percept & Computat, Sch Artificial Intelligence,Minist Educ China, Key Lab Intelligent Percept & Image Understanding,, Xian 710071, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
基金
中国国家自然科学基金;
关键词
Feature extraction; Transformers; Satellites; Target tracking; Videos; Video tracking; Computational modeling; Adaptive Transformer; multi-scale Transformer (MT); object regression; satellite video tracking; OBJECT TRACKING;
D O I
10.1109/TGRS.2024.3441038
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Satellite video tracking tasks are often characterized by blurred foreground boundaries in vast scenes, a wide range of targets varying in scale, and irregular changes in appearance. These challenges significantly impact the optimization of robust tracker performance. Therefore, it is imperative to extract diverse features with dynamic adaptive learning capabilities for the target being tracked in each sequence. In this article, we explore a novel adaptive multi-scale Transformer (MT) tracker for satellite videos to explore the potential spatiotemporal information of the target effectively. Specifically, a multi-scale spatial Transformer (MSST) is designed to leverage stage-by-stage spatial reduction and channel doubling, thereby enhancing the representation capabilities for the tracked target. In dynamic feature learning, an adaptive temporal Transformer (ATT) is then introduced based on multiple cross attentions, which analyzes the adaptive learning capacity for the dynamic target. It analyzes the weight proportion of different attentions automatically in the specific sequence through the learnable parameters. Finally, a multi-scale feature (MSF) regression module is crafted to improve the positioning accuracy of targets with low pixel counts in satellite scenes. This module accomplishes precise annotation of target boxes by effectively fusing features from diverse stages. We evaluate the proposed tracker performance on several public satellite datasets, including SatSOT, SV248S, and VISO. Experimental results show that the performance of our model can be comparable to the state-of-the-art trackers.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] MSVTNet: Multi-Scale Vision Transformer Neural Network for EEG-Based Motor Imagery Decoding
    Liu, Ke
    Yang, Tao
    Yu, Zhuliang
    Yi, Weibo
    Yu, Hong
    Wang, Guoyin
    Wu, Wei
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (12) : 7126 - 7137
  • [42] MV-MS-FETE: Multi-view multi-scale feature extractor and transformer encoder for stenosis recognition in echocardiograms
    Avola, Danilo
    Cannistraci, Irene
    Cascio, Marco
    Cinque, Luigi
    Fagioli, Alessio
    Foresti, Gian Luca
    Rodola, Emanuele
    Solito, Luciana
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 245
  • [43] CFTracker: Multi-Object Tracking With Cross-Frame Connections in Satellite Videos
    Kong, Lingyu
    Yan, Zhiyuan
    Zhang, Yidan
    Diao, Wenhui
    Zhu, Zining
    Wang, Lei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [44] MULTI-FEATURES INTEGRATION BASED HYPERSPECTRAL VIDEOS TRACKER
    Zhang, Zhe
    Qian, Kun
    Du, Juan
    Zhou, Huixin
    2021 11TH WORKSHOP ON HYPERSPECTRAL IMAGING AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2021,
  • [45] Multi-part and scale adaptive visual tracker based on kernel correlation filter
    Luo, Mingqi
    Zhou, Bin
    Wang, Tuo
    PLOS ONE, 2020, 15 (04):
  • [46] Fine-Grained Modulation Classification Using Multi-Scale Radio Transformer With Dual-Channel Representation
    Zheng, Qinghe
    Zhao, Penghui
    Wang, Hongjun
    Elhanashi, Abdussalam
    Saponara, Sergio
    IEEE COMMUNICATIONS LETTERS, 2022, 26 (06) : 1298 - 1302
  • [47] Multi-Scale Tokens-Aware Transformer Network for Multi-Region and Multi-Sequence MR-to-CT Synthesis in a Single Model
    Zhong, Liming
    Chen, Zeli
    Shu, Hai
    Zheng, Kaiyi
    Li, Yin
    Chen, Weicui
    Wu, Yuankui
    Ma, Jianhua
    Feng, Qianjin
    Yang, Wei
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 794 - 806
  • [48] Matching Multi-Scale Feature Sets in Vision Transformer for Few-Shot Classification
    Song, Mingchen
    Yao, Fengqin
    Zhong, Guoqiang
    Ji, Zhong
    Zhang, Xiaowei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12638 - 12651
  • [49] Dual-Path Multi-Scale Transformer for High-Quality Image Deraining
    Zhou, Huiling
    Chen, Hongming
    Wu, Xianhao
    Li, Yufeng
    2024 IEEE 26TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, MMSP, 2024,
  • [50] Research on Multi-Scale CNN and Transformer-Based Multi-Level Multi-Classification Method for Images
    Gou, Quandeng
    Ren, Yuheng
    IEEE ACCESS, 2024, 12 : 103049 - 103059