Unsupervised foveal vision neural architecture with top-down attention

被引:2
作者
Burt, Ryan [1 ]
Thigpen, Nina N. [1 ]
Keil, Andreas [1 ]
Principe, Jose C. [1 ]
机构
[1] Univ Florida, Dept Elect & Comp Engn, Computat NeuroEngn Lab, Gainesville, FL 32601 USA
关键词
Unsupervised Learning; Foveal vision; Top-down saliency; Deep learning; SALIENCY DETECTION; VISUAL-ATTENTION; RECOGNITION; MODEL; PERCEPTION; PATHWAYS; NETWORK; OBJECTS; SEARCH; SCENE;
D O I
10.1016/j.neunet.2021.03.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning architectures are an extremely powerful tool for recognizing and classifying images. However, they require supervised learning and normally work on vectors of the size of image pixels and produce the best results when trained on millions of object images. To help mitigate these issues, we propose an end-to-end architecture that fuses bottom-up saliency and top-down attention with an object recognition module to focus on relevant data and learn important features that can later be fine-tuned for a specific task, employing only unsupervised learning. In addition, by utilizing a virtual fovea that focuses on relevant portions of the data, the training speed can be greatly improved. We test the performance of the proposed Gamma saliency technique on the Toronto and CAT 2000 databases, and the foveated vision in the large Street View House Numbers (SVHN) database. The results with foveated vision show that Gamma saliency performs at the same level as the best alternative algorithms while being computationally faster. The results in SVHN show that our unsupervised cognitive architecture is comparable to fully supervised methods and that saliency also improves CNN performance if desired. Finally, we develop and test a top-down attention mechanism based on the Gamma saliency applied to the top layer of CNNs to facilitate scene understanding in multi-object cluttered images. We show that the extra information from top-down saliency is capable of speeding up the extraction of digits in the cluttered multidigit MNIST data set, corroborating the important role of top down attention. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页码:145 / 159
页数:15
相关论文
共 86 条
[41]  
Gu CH, 2009, PROC CVPR IEEE, P1030, DOI 10.1109/CVPRW.2009.5206727
[42]   Visual Saliency Detection With Free Energy Theory [J].
Gu, Ke ;
Zhai, Guangtao ;
Lin, Weisi ;
Yang, Xiaokang ;
Zhang, Wenjun .
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (10) :1552-1555
[43]   A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression [J].
Guo, Chenlei ;
Zhang, Liming .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010, 19 (01) :185-198
[44]  
Hae Jong Seo, 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), P45, DOI 10.1109/CVPR.2009.5204207
[45]  
Harel J., 2007, P NIPS
[46]  
Hasanbelliu E., 2008, P IEEE WORKSH MACH L
[47]   Automatic foveation for video compression using a neurobiological model of visual attention [J].
Itti, L .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2004, 13 (10) :1304-1318
[48]   A model of saliency-based visual attention for rapid scene analysis [J].
Itti, L ;
Koch, C ;
Niebur, E .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (11) :1254-1259
[49]  
Jaderberg M, 2015, ADV NEUR IN, V28
[50]  
Judd T., 2012, MIT TECHNICAL REPORT