Unsupervised foveal vision neural architecture with top-down attention

被引：2

作者：

Burt, Ryan ^{[1
]}

Thigpen, Nina N. ^{[1
]}

Keil, Andreas ^{[1
]}

Principe, Jose C. ^{[1
]}

机构：

[1] Univ Florida, Dept Elect & Comp Engn, Computat NeuroEngn Lab, Gainesville, FL 32601 USA

来源：

NEURAL NETWORKS | 2021年 / 141卷

关键词：

Unsupervised Learning; Foveal vision; Top-down saliency; Deep learning; SALIENCY DETECTION; VISUAL-ATTENTION; RECOGNITION; MODEL; PERCEPTION; PATHWAYS; NETWORK; OBJECTS; SEARCH; SCENE;

D O I：

10.1016/j.neunet.2021.03.003

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning architectures are an extremely powerful tool for recognizing and classifying images. However, they require supervised learning and normally work on vectors of the size of image pixels and produce the best results when trained on millions of object images. To help mitigate these issues, we propose an end-to-end architecture that fuses bottom-up saliency and top-down attention with an object recognition module to focus on relevant data and learn important features that can later be fine-tuned for a specific task, employing only unsupervised learning. In addition, by utilizing a virtual fovea that focuses on relevant portions of the data, the training speed can be greatly improved. We test the performance of the proposed Gamma saliency technique on the Toronto and CAT 2000 databases, and the foveated vision in the large Street View House Numbers (SVHN) database. The results with foveated vision show that Gamma saliency performs at the same level as the best alternative algorithms while being computationally faster. The results in SVHN show that our unsupervised cognitive architecture is comparable to fully supervised methods and that saliency also improves CNN performance if desired. Finally, we develop and test a top-down attention mechanism based on the Gamma saliency applied to the top layer of CNNs to facilitate scene understanding in multi-object cluttered images. We show that the extra information from top-down saliency is capable of speeding up the extraction of digits in the cluttered multidigit MNIST data set, corroborating the important role of top down attention. (C) 2021 Elsevier Ltd. All rights reserved.

引用

页码：145 / 159

页数：15

共 86 条

[1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2] Advani S, 2013, INT CONF ACOUST SPEE, P2596, DOI 10.1109/ICASSP.2013.6638125
[3] Learning to See by Moving
Agrawal, Pulkit
Carreira, Joao
Malik, Jitendra
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 37 - 45
[4] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Anderson, Peter
He, Xiaodong
Buehler, Chris
Teney, Damien
Johnson, Mark
Gould, Stephen
Zhang, Lei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
[5] [Anonymous], 2016, ARXIV150602351
[6] [Anonymous], 2015, ARXIV151203385V1CSCV
[7] [Anonymous], 1967, EYE MOVEMENTS VISION
[8] [Anonymous], 2007, J VISION
[9] [Anonymous], 2006, Digital Light Field Photography
[10] [Anonymous], 2007, 2007 IEEE C COMPUTER, DOI [10.1109/CVPR.2007.383267, DOI 10.1109/CVPR.2007.383267]

← 1 2 3 4 5 6 7 8 9 →