Saliency-based Sequential Image Attention with Multiset Prediction

被引:0
作者
Welleck, Sean [1 ]
Mao, Jialin [1 ]
Cho, Kyunghyun [1 ]
Zhang, Zheng [1 ]
机构
[1] NYU, New York, NY 10003 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017) | 2017年 / 30卷
关键词
VISUAL-ATTENTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans process visual scenes selectively and sequentially using attention. Central to models of human visual attention is the saliency map. We propose a hierarchical visual architecture that operates on a saliency map and uses a novel attention mechanism to sequentially focus on salient regions and take additional glimpses within those regions. The architecture is motivated by human visual attention, and is used for multi-label image classification on a novel multiset task, demonstrating that it achieves high precision and recall while localizing objects with its attention. Unlike conventional multi-label image classification models, the model supports multiset prediction due to a reinforcement-learning based training process that allows for arbitrary label permutation and multiple instances per label.
引用
收藏
页数:11
相关论文
共 47 条
[1]  
[Anonymous], 2015, Advances in Neural Information Processing Systems
[2]  
[Anonymous], 2014, arXiv
[3]  
[Anonymous], Reading digits in natural images with unsupervised feature learning
[4]  
[Anonymous], ARXIV160506217
[5]   Evidence for split attentional foci [J].
Awh, E ;
Pashler, H .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2000, 26 (02) :834-846
[6]  
Bellver M., 2016, Hierarchical object detection with deep reinforcement learning, V31
[7]   Active Object Localization with Deep Reinforcement Learning [J].
Caicedo, Juan C. ;
Lazebnik, Svetlana .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2488-2496
[8]   Visual attention: The past 25 years [J].
Carrasco, Marisa .
VISION RESEARCH, 2011, 51 (13) :1484-1525
[9]   Covert attention increases spatial resolution with or without masks: Support for signal enhancement [J].
Carrasco, Marisa ;
Williams, Patrick E. ;
Yeshurun, Yaffa .
JOURNAL OF VISION, 2002, 2 (06) :467-479
[10]   Tracking multiple targets with multifocal attention [J].
Cavanagh, P ;
Alvarez, GA .
TRENDS IN COGNITIVE SCIENCES, 2005, 9 (07) :349-354