Top-down control of visual attention in object detection.

被引:264
作者
Oliva, A [1 ]
Torralba, A [1 ]
Castelhano, MS [1 ]
Henderson, JM [1 ]
机构
[1] Michigan State Univ, E Lansing, MI 48824 USA
来源
2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 1, PROCEEDINGS | 2003年
关键词
D O I
10.1109/icip.2003.1246946
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Current computational models of visual attention focus on bottom-up information and ignore scene context. However, studies in visual cognition show that humans use context to facilitate object detection in natural scenes by directing their attention or eyes to diagnostic regions. Here we propose a model of attention guidance based on global scene configuration. We show that the statistics of low-level features across the scene image determine where a specific object (e.g. a person) should be located. Human eye movements show that regions chosen by the top-down model agree with regions scrutinized by human observers performing a visual search task for people. The results validate the proposition that top-down information from visual context modulates the saliency of image regions during the task of object detection. Contextual information provides a shortcut for efficient object detection systems.
引用
收藏
页码:253 / 256
页数:4
相关论文
共 13 条
[1]  
CASTELHANO MS, 2003, 3 ANN M VIS SCI SOC
[2]   The effects of semantic consistency on eye movements during complex scene viewing [J].
Henderson, JM ;
Weeks, PA ;
Hollingworth, A .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1999, 25 (01) :210-228
[3]   A model of saliency-based visual attention for rapid scene analysis [J].
Itti, L ;
Koch, C ;
Niebur, E .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (11) :1254-1259
[4]  
MANNAN S, 1998, SPATIAL VISION, V9, P363
[5]  
MIAU F, 2001, P IEEE ENG MED BIOL
[6]   Modeling the shape of the scene: A holistic representation of the spatial envelope [J].
Oliva, A ;
Torralba, A .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2001, 42 (03) :145-175
[7]   Modeling the role of salience in the allocation of overt visual attention [J].
Parkhurst, D ;
Law, K ;
Niebur, E .
VISION RESEARCH, 2002, 42 (01) :107-123
[8]   A simple saliency model predicts a number of motion popout phenomena [J].
Rosenholtz, R .
VISION RESEARCH, 1999, 39 (19) :3157-3163
[9]  
SIMONCELLI EP, 1995, 2 ANN INT C IM P WAS
[10]   CONTEXT-BASED VISION - RECOGNIZING OBJECTS USING INFORMATION FROM BOTH 2-D AND 3-D IMAGERY [J].
STRAT, TM ;
FISCHLER, MA .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1991, 13 (10) :1050-1065