SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking

被引:12
作者
Yao, Liangliang [1 ]
Fu, Changhong [1 ]
Li, Sihang [1 ]
Zheng, Guangze [2 ]
Ye, Junjie [1 ]
机构
[1] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China
[2] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA | 2023年
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
D O I
10.1109/ICRA48891.2023.10161487
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Vision-based object tracking has boosted extensive autonomous applications for unmanned aerial vehicles (UAVs). However, the dynamic changes in flight maneuver and viewpoint encountered in UAV tracking pose significant difficulties, e.g., aspect ratio change, and scale variation. The conventional cross-correlation operation, while commonly used, has limitations in effectively capturing perceptual similarity and incorporates extraneous background information. To mitigate these limitations, this work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking. The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation and effectively discriminate foreground and background information. Additionally, a saliency adaptation embedding operation dynamically generates tokens based on initial saliency, thereby reducing the computational complexity of the Transformer architecture. Finally, a lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information. The efficacy and robustness of the proposed approach have been thoroughly assessed through experiments on three widely-used UAV tracking benchmarks and real-world scenarios, with results demonstrating its superiority. The source code and demo videos are available at https://github.com/vision4robotics/SGDViT.
引用
收藏
页码:3353 / 3359
页数:7
相关论文
共 50 条
  • [21] A Saliency-Guided Method for Automatic Photo Refocusing
    Liu, Na
    Ju, Ran
    Ren, Tongwei
    Wu, Gangshan
    8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 264 - 267
  • [22] Saliency-Guided Transformer Network combined with Local Embedding for No-Reference Image Quality Assessment
    Zhu, Mengmeng
    Hou, Guanqun
    Chen, Xinjia
    Xie, Jiaxing
    Lu, Haixian
    Che, Jun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1953 - 1962
  • [23] A SPARSE LINEAR MODEL FOR SALIENCY-GUIDED DECOLORIZATION
    Liu, Chun-Wei
    Liu, Tyng-Luh
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 1105 - 1109
  • [24] Saliency-guided stairs detection on wearable RGB-D devices for visually with Swin-Transformer
    Zheng, Zhuowen
    He, Jiahui
    Gu, Jia
    Chen, Zhen
    Qin, Wenjian
    PATTERN RECOGNITION LETTERS, 2024, 177 : 47 - 53
  • [25] Unsupervised saliency-guided SAR image change detection
    Zheng, Yaoguo
    Jiao, Licheng
    Liu, Hongying
    Zhang, Xiangrong
    Hou, Biao
    Wang, Shuang
    PATTERN RECOGNITION, 2017, 61 : 309 - 326
  • [26] Boosting Factorization Machines via Saliency-Guided Mixup
    Wu, Chenwang
    Lian, Defu
    Ge, Yong
    Zhou, Min
    Chen, Enhong
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) : 4443 - 4459
  • [27] Saliency-guided Selective Magnification for Company Logo Detection
    Eggert, Christian
    Winschel, Anton
    Zecha, Dan
    Lienhart, Rainer
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 651 - 656
  • [28] BE NATURAL: A SALIENCY-GUIDED DEEP FRAMEWORK FOR IMAGE QUALITY
    Hou, Weilong
    Gao, Xinbo
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,
  • [29] Saliency-Guided Deep Framework for Image Quality Assessment
    Hou, Weilong
    Gao, Xinbo
    IEEE MULTIMEDIA, 2015, 22 (02) : 46 - 55
  • [30] Saliency-guided improvement for hand posture detection and recognition
    Chuang, Yuelong
    Chen, Ling
    Chen, Gencai
    NEUROCOMPUTING, 2014, 133 : 404 - 415