SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking

被引：12

作者：

Yao, Liangliang ^{[1
]}

Fu, Changhong ^{[1
]}

Li, Sihang ^{[1
]}

Zheng, Guangze ^{[2
]}

Ye, Junjie ^{[1
]}

机构：

[1] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China

[2] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA | 2023年

基金：

中国国家自然科学基金; 上海市自然科学基金;

关键词：

D O I：

10.1109/ICRA48891.2023.10161487

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Vision-based object tracking has boosted extensive autonomous applications for unmanned aerial vehicles (UAVs). However, the dynamic changes in flight maneuver and viewpoint encountered in UAV tracking pose significant difficulties, e.g., aspect ratio change, and scale variation. The conventional cross-correlation operation, while commonly used, has limitations in effectively capturing perceptual similarity and incorporates extraneous background information. To mitigate these limitations, this work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking. The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation and effectively discriminate foreground and background information. Additionally, a saliency adaptation embedding operation dynamically generates tokens based on initial saliency, thereby reducing the computational complexity of the Transformer architecture. Finally, a lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information. The efficacy and robustness of the proposed approach have been thoroughly assessed through experiments on three widely-used UAV tracking benchmarks and real-world scenarios, with results demonstrating its superiority. The source code and demo videos are available at https://github.com/vision4robotics/SGDViT.

引用

页码：3353 / 3359

页数：7

共 50 条

[21] A Saliency-Guided Method for Automatic Photo Refocusing
Liu, Na
Ju, Ran
Ren, Tongwei
Wu, Gangshan
8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 264 - 267
[22] Saliency-Guided Transformer Network combined with Local Embedding for No-Reference Image Quality Assessment
Zhu, Mengmeng
Hou, Guanqun
Chen, Xinjia
Xie, Jiaxing
Lu, Haixian
Che, Jun
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1953 - 1962
[23] A SPARSE LINEAR MODEL FOR SALIENCY-GUIDED DECOLORIZATION
Liu, Chun-Wei
Liu, Tyng-Luh
2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 1105 - 1109
[24] Saliency-guided stairs detection on wearable RGB-D devices for visually with Swin-Transformer
Zheng, Zhuowen
He, Jiahui
Gu, Jia
Chen, Zhen
Qin, Wenjian
PATTERN RECOGNITION LETTERS, 2024, 177 : 47 - 53
[25] Unsupervised saliency-guided SAR image change detection
Zheng, Yaoguo
Jiao, Licheng
Liu, Hongying
Zhang, Xiangrong
Hou, Biao
Wang, Shuang
PATTERN RECOGNITION, 2017, 61 : 309 - 326
[26] Boosting Factorization Machines via Saliency-Guided Mixup
Wu, Chenwang
Lian, Defu
Ge, Yong
Zhou, Min
Chen, Enhong
Tao, Dacheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) : 4443 - 4459
[27] Saliency-guided Selective Magnification for Company Logo Detection
Eggert, Christian
Winschel, Anton
Zecha, Dan
Lienhart, Rainer
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 651 - 656
[28] BE NATURAL: A SALIENCY-GUIDED DEEP FRAMEWORK FOR IMAGE QUALITY
Hou, Weilong
Gao, Xinbo
2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,
[29] Saliency-Guided Deep Framework for Image Quality Assessment
Hou, Weilong
Gao, Xinbo
IEEE MULTIMEDIA, 2015, 22 (02) : 46 - 55
[30] Saliency-guided improvement for hand posture detection and recognition
Chuang, Yuelong
Chen, Ling
Chen, Gencai
NEUROCOMPUTING, 2014, 133 : 404 - 415

← 1 2 3 4 5 →