Feature Aggregation Networks Based on Dual Attention Capsules for Visual Object Tracking

被引:19
作者
Cao, Yi [1 ,2 ]
Ji, Hongbing [1 ]
Zhang, Wenbo [1 ]
Shirani, Shahram [2 ]
机构
[1] Xidian Univ, Sch Elect Engn, Xian 710071, Peoples R China
[2] McMaster Univ, Dept Elect & Comp Engn, Hamilton, ON L8S 4K1, Canada
基金
中国国家自然科学基金;
关键词
Feature extraction; Neurons; Visualization; Object tracking; Computational modeling; Interference; Benchmark testing; Visual object tracking; tracking-by-detection; feature capsule; group attention; penalty attention;
D O I
10.1109/TCSVT.2021.3063001
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Tracking-by-detection algorithms have considerably enhanced tracking performance with the introduction of recent convolutional neural networks (CNNs). However, most trackers directly exploit standard scalar-output CNN features, which may not capture enough feature encoding information, instead of aggregated CNN features of vector-output form. In this paper, we propose an end-to-end feature aggregation capsule framework. First, based on the existing CNN network, we aggregate a certain number of similar position-aware CNN features into a capsule to model the feature similarity. The acquired vector-level feature capsules (rather than previous scalar-level pointwise features) are utilized for differentiation learning. We then propose a group attention module to better model the entity representation between different capsule groups thus optimizes total discriminative capability. Third, to reduce the prediction interference resulted by the side effect of dimension rising within capsules, we propose a penalty attention module. Such strategy could dynamically adjust values of neurons by estimating whether they are beneficial or harmful to tracking. Experimental results on five representative benchmarks (UAVDT, DTB70, UAV123, VOT2016 and VOT2018) demonstrate the excellent tracking performance of our dual attention based capsule tracker (DACapT). Specially, it exceeds the previous top tracker by 4.6%/1.9% in precision/success evaluations on UAVDT.
引用
收藏
页码:674 / 689
页数:16
相关论文
共 77 条
[1]  
[Anonymous], 2014, P 2014 BRIT MACH VIS
[2]  
[Anonymous], 2018, ARXIV180902714
[3]  
[Anonymous], 2015, ARXIV150104587
[4]   Fully-Convolutional Siamese Networks for Object Tracking [J].
Bertinetto, Luca ;
Valmadre, Jack ;
Henriques, Joao F. ;
Vedaldi, Andrea ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 :850-865
[5]  
Cehovin L, 2016, IEEE WINT CONF APPL
[6]   Visual Object Tracking Performance Measures Revisited [J].
Cehovin, Luka ;
Leonardis, Ales ;
Kristan, Matej .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (03) :1261-1274
[7]   The devil is in the details: an evaluation of recent feature encoding methods [J].
Chatfield, Ken ;
Lempitsky, Victor ;
Vedaldi, Andrea ;
Zisserman, Andrew .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
[8]   Constructing Adaptive Complex Cells for Robust Visual Tracking [J].
Chen, Dapeng ;
Yuan, Zejian ;
Wu, Yang ;
Zhang, Geng ;
Zheng, Nanning .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :1113-1120
[9]   Visual Tracking Using Attention-Modulated Disintegration and Integration [J].
Choi, Jongwon ;
Chang, Hyung Jin ;
Jeong, Jiyeoup ;
Demiris, Yiannis ;
Choi, Jin Young .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4321-4330
[10]   ECO: Efficient Convolution Operators for Tracking [J].
Danelljan, Martin ;
Bhat, Goutam ;
Khan, Fahad Shahbaz ;
Felsberg, Michael .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6931-6939