End-to-end wavelet block feature purification network for efficient and effective UAV object tracking

被引:1
作者
Wang, Haijun [1 ]
Qi, Lihua [1 ]
Qu, Haoyu
Ma, Wenlai [1 ,2 ]
Yuan, Wei [1 ]
Hao, Wei [1 ]
机构
[1] Binzhou Univ, Aviat Informat Technol Res & Dev, Binzhou, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Coll Civil Aviat, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
Wavelet; Transformer; Unmanned aerial vehicle; Self-attention learning; Downsampling-upsampling strategy; CORRELATION FILTER;
D O I
10.1016/j.jvcir.2023.103950
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, unmanned aerial vehicle (UAV) object tracking tasks have significantly improved with the emergence of deep learning. However, owing to the object feature pollution caused by motion blur, illumination variation, and occlusion, most of the existing trackers often fail to precisely localize the target in the complex real -world circumstances. To overcome this challenge, we present a novel wavelet block feature purification network (WFPN) for efficient and effective UAV tracking. WFPN is mainly composed of downsampling network through wavelet transforms and upsampling network through inverse wavelet transforms. To be specific, the downsampling network performs discrete wavelet transform (DWT) to reduce interference information and preserve original feature details, while the upsampling network applies inverse DWT (IDWT) to reconstruct decontaminated feature information. Additionally, a novel sequential encoder is introduced to achieve a better purification effect. Finally, a pooling distance loss is devised to improve the purification effect of DWT downsampling network. Extensive experiments show that our WFPN achieves promising tracking performance on three well-known UAV benchmarks, especially on sequences with feature pollution. Moreover, our method runs at 33.2 frames per second on the edge platform of Nvidia Jetson AGX Orin, which is suitable for UAVs with limited onboard payload and computing capability.
引用
收藏
页数:14
相关论文
共 71 条
[1]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[2]  
Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960
[3]   TCTrack: Temporal Contexts for Aerial Tracking [J].
Cao, Ziang ;
Huang, Ziyuan ;
Pan, Liang ;
Zhang, Shiwei ;
Liu, Ziwei ;
Fu, Changhong .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :14778-14788
[4]   SiamAPN plus plus : Siamese Attentional Aggregation Network for Real-Time UAV Tracking [J].
Cao, Ziang ;
Fu, Changhong ;
Ye, Junjie ;
Li, Bowen ;
Li, Yiming .
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, :3086-3092
[5]   HiFT: Hierarchical Feature Transformer for Aerial Tracking [J].
Cao, Ziang ;
Fu, Changhong ;
Ye, Junjie ;
Li, Bowen ;
Li, Yiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15437-15446
[6]   High-Performance Transformer Tracking [J].
Chen, Xin ;
Yan, Bin ;
Zhu, Jiawen ;
Lu, Huchuan ;
Ruan, Xiang ;
Wang, Dong .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) :8507-8523
[7]   Siamese Box Adaptive Network for Visual Tracking [J].
Chen, Zedu ;
Zhong, Bineng ;
Li, Guorong ;
Zhang, Shengping ;
Ji, Rongrong .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6667-6676
[8]   MixFormer: End-to-End Tracking with Iterative Mixed Attention [J].
Cui, Yutao ;
Jiang, Cheng ;
Wang, Limin ;
Wu, Gangshan .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13598-13608
[9]   ECO: Efficient Convolution Operators for Tracking [J].
Danelljan, Martin ;
Bhat, Goutam ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6931-6939
[10]   Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking [J].
Danelljan, Martin ;
Robinson, Andreas ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :472-488