UAV-Ground Visual Tracking: A Unified Dataset and Collaborative Learning Approach

被引：6

作者：

Sun, Dengdi ^{[1
,2
,3
]}

Cheng, Leilei ^{[4
]}

Chen, Song ^{[4
]}

Li, Chenglong ^{[1
,2
]}

Xiao, Yun ^{[1
,2
]}

Luo, Bin ^{[4
]}

机构：

[1] Anhui Univ, Minist Educ, Key Lab Intelligent Comp & Signal Proc, Hefei 230601, Peoples R China

[2] Anhui Univ, Sch Artificial Intelligence, Hefei 230601, Peoples R China

[3] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230026, Peoples R China

[4] Anhui Univ, Sch Comp Sci & Technol, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Target tracking; Visualization; Autonomous aerial vehicles; Object tracking; Task analysis; Fuses; Video sequences; Visual tracking; transformer; UAV and ground views; benchmark dataset; collaborative learning; OBJECT TRACKING; FUSION; ROBUST;

D O I：

10.1109/TCSVT.2023.3316990

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Visual tracking from the ground view and the UAV view has received increasing attention due to its wide range of practical applications. These two tasks have strong complementary benefits in the description of the target object, such as detailed appearance in the ground view and global motion information in the UAV view, and their combination has the potential to allow the tracking system to be more robust. However, no work has studied this problem in-depth, and it is challenging to accurately combine the ground view information and the UAV view information. To fill the gap and address the challenge, we propose a new computer vision task called UAV-Ground visual tracking. Considering the lack of relevant data and methods, we first propose a unified video dataset called UGVT, which includes 210 pairs of UAV and ground high-resolution video sequences with a total of more than 204K frames, which can be used as a comprehensive evaluation platform for relevant tracking methods. Secondly, based on the newly constructed dataset, we propose a co-learning method called MvCL to fuse the information of ground and UAV views. It first associates the same tracking target in the two views based on cross-attention operation and then fuses the complementary information of the two views. In particular, as a plug-and-play module based on Transformer structure, this method can be flexibly embedded into different tracking frameworks. Extensive experiments are conducted on the newly created dataset. The results demonstrate the effectiveness of the proposed method in improving the robustness of the tracking system compared with 10 state-of-the-art tracking methods and also indicate the prospect and significance of potential UAV-Ground visual tracking research. The dataset is available at: https://github.com/mmic-lcl/Datasets-and-benchmark-code/.

引用

页码：3619 / 3632

页数：14

共 48 条

[1] Wavelet based image fusion techniques - An introduction, review and comparison [J].

Amolins, Krista ;

Zhang, Yun ;

Dare, Peter .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2007, 62 (04) :249-263

[2]

Ba J, 2014, ACS SYM SER

[3] Learning Discriminative Model Prediction for Tracking [J].

Bhat, Goutam ;

Danelljan, Martin ;

Van Gool, Luc ;

Timofte, Radu .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190

[4]

Biresaw TA, 2016, 2016 13TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), P295, DOI 10.1109/AVSS.2016.7738055

[5] TCTrack: Temporal Contexts for Aerial Tracking [J].

Cao, Ziang ;

Huang, Ziyuan ;

Pan, Liang ;

Zhang, Shiwei ;

Liu, Ziwei ;

Fu, Changhong .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :14778-14788

[6]

Cen MB, 2018, IEEE IMAGE PROC, P3718, DOI 10.1109/ICIP.2018.8451102

[7] Transformer Tracking [J].

Chen, Xin ;

Yan, Bin ;

Zhu, Jiawen ;

Wang, Dong ;

Yang, Xiaoyun ;

Lu, Huchuan .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131

[8] MixFormer: End-to-End Tracking with Iterative Mixed Attention [J].

Cui, Yutao ;

Jiang, Cheng ;

Wang, Limin ;

Wu, Gangshan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :13598-13608

[9] ATOM: Accurate Tracking by Overlap Maximization [J].

Danelljan, Martin ;

Bhat, Goutam ;

Khan, Fahad Shahbaz ;

Felsberg, Michael .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4655-4664

[10] VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [J].

Deng, Shengheng ;

Liang, Zhihao ;

Sun, Lin ;

Jia, Kui .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :8438-8447

← 1 2 3 4 5 →