Multi attention module for visual tracking

被引:65
作者
Chen, Boyu [1 ]
Li, Peixia [1 ]
Sun, Chong [1 ]
Wang, Dong [1 ]
Yang, Gang [2 ]
Lu, Huchuan [1 ]
机构
[1] Dalian Univ Technol, Fac Elect Informat & Elect Engn, Sch Informat & Commun Engn, Dalian, Peoples R China
[2] Northeastern Univ, Fac Coll Informat Sci & Engn, Shenyang, Liaoning, Peoples R China
关键词
Visual tracking; Deep neural network; Attention model; Long short term memory;
D O I
10.1016/j.patcog.2018.10.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new visual tracking algorithm leveraging multi-level visual attention to take full use of the information during tracking. Visual attention has been widely applied in many visual tasks, such as image captioning and question answering. However, most existing attention models only focus on one or two aspects, ignoring the other useful information in visual tracking. Here, we think there are four main attentional aspects in the tracking task and propose a unified network to leverage multi-level visual attention, which includes layer-wise attention, temporal attention, spatial attention and channel-wise attention. Considering that deep features of different levels may be suitable for different scenarios, we propose to train an attention network in the off-line stage to facilitate feature selection in online tracking. To better exploit the temporal consistency assumption of visual tracking, we implement the attention network with long short term memory (LSTM) units, which are capable of capturing the historical context information to perform more reliable inference at the current time step. Different from the image classification task, background clutter is more complicated in the tracking task. Thus, we purify the features by spatial attention and channel-wise attention to effectively suppress the background noise and highlight the target region. In addition, we also enforce deep feature sharing across target candidates using Region of Interest pooling, allowing the features of all candidates to be extracted in only one forward pass of the DNN. To further improve tracking accuracy, a promoting strategy for trackers with detection results of a generic object detector is proposed, reducing the risk of tracking drifts. The proposed tracking algorithm compares favorably against state-of-the-art methods on three popular benchmark datasets. Extensive experimental evaluations demonstrate the effectiveness of the proposed techniques. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:80 / 93
页数:14
相关论文
共 51 条
  • [1] [Anonymous], 2017, PARALLEL TRACKING VE
  • [2] [Anonymous], 2016, Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
  • [3] [Anonymous], 2017, CREST CONVOLUTIONAL
  • [4] [Anonymous], 2017, SANET STRUCTURE AWAR
  • [5] [Anonymous], 2013, LEARNING DEEP COMPAC
  • [6] [Anonymous], 2016, HEDGED DEEP TRACKING
  • [7] [Anonymous], 2015, HIERARCHICAL CONVOLU
  • [8] [Anonymous], 2014, P AS C COMP VIS
  • [9] [Anonymous], 2017, LEARNING BACKGROUND
  • [10] [Anonymous], 2016 IEEE Conf. Comp. Vis. Patt. Recog. (CVPR)