A Neural Autoregressive Approach to Attention-based Recognition

被引:0
作者
Yin Zheng
Richard S. Zemel
Yu-Jin Zhang
Hugo Larochelle
机构
[1] Tsinghua University,Department of Electronic Engineering
[2] University of Toronto,Department of Computer Science
[3] Université de Sherbrooke,Départment d’informatique
来源
International Journal of Computer Vision | 2015年 / 113卷
关键词
Deep learning; Attention-based recognition; Neural networks; Neural autoregressive distribution estimator;
D O I
暂无
中图分类号
学科分类号
摘要
Tasks that require the synchronization of perception and action are incredibly hard and pose a fundamental challenge to the fields of machine learning and computer vision. One important example of such a task is the problem of performing visual recognition through a sequence of controllable fixations; this requires jointly deciding what inference to perform from fixations and where to perform these fixations. While these two problems are challenging when addressed separately, they become even more formidable if solved jointly. Recently, a restricted Boltzmann machine (RBM) model was proposed that could learn meaningful fixation policies and achieve good recognition performance. In this paper, we propose an alternative approach based on a feed-forward, auto-regressive architecture, which permits exact calculation of training gradients (given the fixation sequence), unlike for the RBM model. On a problem of facial expression recognition, we demonstrate the improvement gained by this alternative approach. Additionally, we investigate several variations of the model in order to shed some light on successful strategies for fixation-based recognition.
引用
收藏
页码:67 / 79
页数:12
相关论文
共 23 条
[1]  
Butko NJ(2010)Infomax control of eye movements IEEE Transactions on Autonomous Mental Development 2 91-107
[2]  
Movellan JR(2004)Lowe. Distinctive image features from scale-invariant keypoints International Journal of Computer Vision 60 91-110
[3]  
David G(2012)Learning where to attend with deep architectures for image tracking Neural Computation 24 2151-2184
[4]  
Denil M(2009)View-invariant object category learning, recognition, and search: How spatial and object attention are coordinated using surface-based attentional shrouds Cognitive psychology 58 1-48
[5]  
Bazzani L(2002)Training products of experts by minimizing contrastive divergence Neural Computation 14 1771-1800
[6]  
Larochelle H(2012)Imagenet classification with deep convolutional neural networks Advances in Neural Information Processing Systems 25 1106-1114
[7]  
de Freitas N(2011)The neural autoregressive distribution estimator Artificial Intelligence and Statistics (AISTATS) 15 29-37
[8]  
Fazl A(2012)A neural autoregressive topic model Advances in Neural Information Processing Systems 25 2717-2725
[9]  
Grossberg S(2005)Optimal eye movement strategies in visual search Nature 434 387-391
[10]  
Mingolla E(2013)Rnade: The real-valued neural autoregressive density-estimator Advances in Neural Information Processing Systems 26 2175-2183