Bio-inspired unsupervised learning of visual features leads to robust invariant object recognition

被引:89
作者
Kheradpisheh, Saeed Reza [1 ,5 ]
Ganjtabesh, Mohammad [1 ]
Masquelier, Timothee [2 ,3 ,4 ,5 ]
机构
[1] Univ Tehran, Sch Math Stat & Comp Sci, Dept Comp Sci, Tehran, Iran
[2] INSERM, U968, F-75012 Paris, France
[3] Univ Paris 06, Univ Sorbonne, Inst Vis, UMR S 968, F-75012 Paris, France
[4] CNRS, UMR 7210, F-75012 Paris, France
[5] Univ Toulouse, CERCO, CNRS, UMR 5549, F-31300 Toulouse, France
关键词
View-Invariant object recognition; Visual cortex; STDP; Spiking neurons; Temporal coding; INTERMEDIATE COMPLEXITY; CORTEX; INFORMATION; PLASTICITY; VISION; SELECTIVITY; FRAGMENTS; DYNAMICS; MODELS; STDP;
D O I
10.1016/j.neucom.2016.04.029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Retinal image of surrounding objects varies tremendously due to the changes in position, size, pose, illumination condition, background context, occlusion, noise, and non-rigid deformations. But despite these huge variations, our visual system is able to invariantly recognize any object in just a fraction of a second. To date, various computational models have been proposed to mimic the hierarchical processing of the ventral visual pathway, with limited success. Here, we show that the association of both biologically inspired network architecture and learning rule significantly improves the models' performance when facing challenging invariant object recognition problems. Our model is an asynchronous feed-forward spiking neural network. When the network is presented with natural images, the neurons in the entry layers detect edges, and the most activated ones fire first, while neurons in higher layers are equipped with spike timing-dependent plasticity. These neurons progressively become selective to intermediate complexity visual features appropriate for object categorization. The model is evaluated on 3D-Object and ETH-80 datasets which are two benchmarks for invariant object recognition, and is shown to outperform state-of-the-art models, including DeepConvNet and HMAX. This demonstrates its ability to accurately recognize different instances of multiple object classes even under various appearance conditions (different views, scales, tilts, and backgrounds). Several statistical analysis techniques are used to show that our model extracts class specific and highly informative features. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:382 / 392
页数:11
相关论文
共 65 条
[1]   Visual cortex neurons of monkeys and cats: Temporal dynamics of the contrast response function [J].
Albrecht, DG ;
Geisler, WS ;
Frazor, RA ;
Crane, AM .
JOURNAL OF NEUROPHYSIOLOGY, 2002, 88 (02) :888-913
[2]  
[Anonymous], 2002, Computational Neuroscience of Vision
[3]  
[Anonymous], 1991, THESIS
[4]  
[Anonymous], ARXIV14085093
[5]  
[Anonymous], P INT C LEARN REPR
[6]  
[Anonymous], MITCSAILTR2010013
[7]  
Anselmi F., 2014, ARXIV13114158, P1
[8]   Extraction of temporally correlated features from dynamic vision sensors with spike-timing-dependent plasticity [J].
Bichler, Olivier ;
Querlioz, Damien ;
Thorpe, Simon J. ;
Bourgoin, Jean-Philippe ;
Gamrat, Christian .
NEURAL NETWORKS, 2012, 32 :339-348
[9]   RECOGNITION-BY-COMPONENTS - A THEORY OF HUMAN IMAGE UNDERSTANDING [J].
BIEDERMAN, I .
PSYCHOLOGICAL REVIEW, 1987, 94 (02) :115-147
[10]  
Carlson K.D., 2013, Neural Networks (IJCNN), The 2013 International Joint Conference on, P1, DOI [10.1109/IJCNN.2013.6706961., DOI 10.1109/IJCNN.2013.6706961]