HiFT: Hierarchical Feature Transformer for Aerial Tracking

被引:184
作者
Cao, Ziang [1 ]
Fu, Changhong [1 ]
Ye, Junjie [1 ]
Li, Bowen [1 ]
Li, Yiming [2 ]
机构
[1] Tongji Univ, Shanghai, Peoples R China
[2] NYU, New York, NY 10003 USA
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.01517
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most existing Siamese-based tracking methods execute the classification and regression of the target object based on the similarity maps. However, they either employ a single map from the last convolutional layer which degrades the localization accuracy in complex scenarios or separately use multiple maps for decision making, introducing intractable computations for aerial mobile platforms. Thus, in this work, we propose an efficient and effective hierarchical feature transformer (HiFT) for aerial tracking. Hierarchical similarity maps generated by multi-level convolutional layers are fed into the feature transformer to achieve the interactive fusion of spatial (shallow layers) and semantics cues (deep layers). Consequently, not only the global contextual information can be raised, facilitating the target search, but also our end-to-end architecture with the transformer can efficiently learn the interdependencies among multi-level features, thereby discovering a tracking-tailored feature space with strong discriminability. Comprehensive evaluations on four aerial benchmarks have proven the effectiveness of HiFT. Real-world tests on the aerial platform have strongly validated its practicability with a real-time speed. Our code is available at https: //github.com/vision4robotics/HiFT.
引用
收藏
页码:15437 / 15446
页数:10
相关论文
共 53 条
[1]  
[Anonymous], 2021, PROC CVPR IEEE, DOI DOI 10.1109/CVPR46437.2021.00007
[2]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[3]   Learning Discriminative Model Prediction for Tracking [J].
Bhat, Goutam ;
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6181-6190
[4]  
Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960
[5]  
Bonatti R, 2019, IEEE INT C INT ROBOT, P229, DOI [10.1109/IROS40897.2019.8968163, 10.1109/iros40897.2019.8968163]
[6]   HiFT: Hierarchical Feature Transformer for Aerial Tracking [J].
Cao, Ziang ;
Fu, Changhong ;
Ye, Junjie ;
Li, Bowen ;
Li, Yiming .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15437-15446
[7]  
Carion N, 2020, EUR C COMP VIS, P213
[8]   Probabilistic Regression for Visual Tracking [J].
Danelljan, Martin ;
Van Gool, Luc ;
Timofte, Radu .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7181-7190
[9]   ATOM: Accurate Tracking by Overlap Maximization [J].
Danelljan, Martin ;
Bhat, Goutam ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4655-4664
[10]   ECO: Efficient Convolution Operators for Tracking [J].
Danelljan, Martin ;
Bhat, Goutam ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6931-6939