Target-Aware Transformer for Satellite Video Object Tracking

被引:12
作者
Lai, Pujian [1 ]
Zhang, Meili [1 ]
Cheng, Gong [1 ]
Li, Shengyang [2 ,3 ]
Huang, Xiankai [4 ]
Han, Junwei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Peoples R China
[2] Chinese Acad Sci, Ctr Space Utilizat, Key Lab Space Utilizat Technol & Engn, Beijing 100094, Peoples R China
[3] Univ Chinese Acad Sci, Sch Aeronaut & Astronaut, Beijing 100049, Peoples R China
[4] Beijing Technol & Business Univ, Business Sch, Beijing 100048, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
基金
中国国家自然科学基金;
关键词
Bi-direction propagation and fusion (Bi-PF); satellite video object tracking; target-aware enhancement (TAE); CORRELATION FILTER;
D O I
10.1109/TGRS.2023.3339658
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Recent years have witnessed the astonishing development of transformer-based paradigm in single object tracking (SOT) in generic videos. However, due to the fact that the targets of interest in satellite videos are small in size and weak in visual appearance, the advancements of transformer-based paradigm in satellite video object tracking are impeded. To alleviate this issue, a novel transformer-based recipe is proposed, which consists of a bi-direction propagation and fusion (Bi-PF) strategy and a target-aware enhancement (TAE) module. Concretely, we first adopt the Bi-PF strategy to make full use of multiscale information to generate discriminative representations of tracking targets. Then, the TAE module is employed to decouple an object query into content-aware embedding and spatial-aware embedding and produce a target prototype to help get high-quality content-aware embedding. It is worth mentioning that, different from the previous methods in satellite video tracking most of which evaluate their performance using only several videos, we conduct extensive experiments on the SatSOT dataset which consists of 105 videos. In particular, the proposed method achieves the success score of 45.6% and the precision score of 57.6%, surpassing the baseline method by 5.0% and 9.5%, respectively.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 48 条
  • [41] Robust DCF object tracking with adaptive spatial and temporal regularization based on target appearance variation
    Zhou, Lin
    Jin, Yong
    Wang, Han
    Hu, Zhentao
    Zhao, Shuaipeng
    SIGNAL PROCESSING, 2022, 195
  • [42] Feature-fusion and anti-occlusion based target tracking method for satellite videos
    Liu Y.
    Liao Y.
    Lin C.
    Li Z.
    Yang X.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2022, 48 (12): : 2537 - 2547
  • [43] Long-Term Tracking Based on Multi-Feature Adaptive Fusion for Video Target
    Zhang, Hainan
    Sun, Yanjing
    Li, Song
    Shi, Wenjuan
    Feng, Chenglong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (05) : 1342 - 1349
  • [44] Learning temporal regularized and weighted surrounding-aware correlation filter for unmanned aerial vehicle object tracking
    Guo, Zhenghui
    Wang, Haijun
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (03)
  • [45] Object Tracking in Satellite Videos Based on Correlation Filter with Multi-Feature Fusion and Motion Trajectory Compensation
    Liu, Yaosheng
    Liao, Yurong
    Lin, Cunbao
    Jia, Yutong
    Li, Zhaoming
    Yang, Xinyan
    REMOTE SENSING, 2022, 14 (03)
  • [46] Size-aware visual object tracking via dynamic fusion of correlation filter-based part regressors
    Memarmoghadam, Alireza
    Moallem, Payman
    SIGNAL PROCESSING, 2019, 164 : 84 - 98
  • [47] SOCF: A correlation filter for real-time UAV tracking based on spatial disturbance suppression and object saliency-aware
    Ma, Sugang
    Zhao, Bo
    Hou, Zhiqiang
    Yu, Wangsheng
    Pu, Lei
    Yang, Xiaobao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [48] Adaptive target response-based spatio-temporal regularized correlation filter for UAV-based object tracking
    Bhunia, Himadri Sekhar
    Deb, Alok Kanti
    Mukherjee, Jayanta
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (05) : 4763 - 4778