A brain-inspired object-based attention network for multiobject recognition and visual reasoning

被引:5
作者
Adeli, Hossein [1 ]
Ahn, Seoyoung [1 ]
Zelinsky, Gregory J. [1 ,2 ]
机构
[1] SUNY Stony Brook, Dept Psychol, Stony Brook, NY USA
[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY USA
来源
JOURNAL OF VISION | 2023年 / 23卷 / 05期
关键词
CONVOLUTIONAL NEURAL-NETWORKS; ZOOM LENS; PERCEPTION; MODEL; MECHANISMS; GRADIENT; TASK;
D O I
10.1167/jov.23.5.16
中图分类号
R77 [眼科学];
学科分类号
100212 ;
摘要
The visual system uses sequences of selective glimpses to objects to support goal-directed behavior, but how is this attention control learned? Here we present an encoder-decoder model inspired by the interacting bottom-up and top-down visual pathways making up the recognition-attention system in the brain. At every iteration, a new glimpse is taken from the image and is processed through the "what" encoder, a hierarchy of feedforward, recurrent, and capsule layers, to obtain an object-centric (object-file) representation. This representation feeds to the "where" decoder, where the evolving recurrent representation provides top-down attentional modulation to plan subsequent glimpses and impact routing in the encoder. We demonstrate how the attention mechanism significantly improves the accuracy of classifying highly overlapping digits. In a visual reasoning task requiring comparison of two objects, our model achieves near-perfect accuracy and significantly outperforms larger models in generalizing to unseen stimuli. Our work demonstrates the benefits of object-based attention mechanisms taking sequential glimpses of objects.
引用
收藏
页数:17
相关论文
共 96 条
  • [21] VISUAL-ATTENTION WITHIN AND AROUND THE FIELD OF FOCAL ATTENTION - A ZOOM LENS MODEL
    ERIKSEN, CW
    STJAMES, JD
    [J]. PERCEPTION & PSYCHOPHYSICS, 1986, 40 (04): : 225 - 240
  • [22] Distributed Hierarchical Processing in the Primate Cerebral Cortex
    Felleman, Daniel J.
    Van Essen, David C.
    [J]. CEREBRAL CORTEX, 1991, 1 (01) : 1 - 47
  • [23] Comparing machines and humans on a visual categorization test
    Fleuret, Francois
    Li, Ting
    Dubout, Charles
    Wampler, Emma K.
    Yantis, Steven
    Geman, Donald
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (43) : 17621 - 17625
  • [24] Five points to check when comparing visual perception in humans and machines
    Funke, Christina M.
    Borowski, Judy
    Stosio, Karolina
    Brendel, Wieland
    Wallis, Thomas S. A.
    Bethge, Matthias
    [J]. JOURNAL OF VISION, 2021, 21 (03): : 1 - 23
  • [25] Reconciling deep learning with symbolic artificial intelligence: representing objects and relations
    Garnelo, Marta
    Shanahan, Murray
    [J]. CURRENT OPINION IN BEHAVIORAL SCIENCES, 2019, 29 : 17 - 23
  • [26] A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs
    George, Dileep
    Lehrach, Wolfgang
    Kansky, Ken
    Lazaro-Gredilla, Miguel
    Laan, Christopher
    Marthi, Bhaskara
    Lou, Xinghua
    Meng, Zhaoshi
    Liu, Yi
    Wang, Huayan
    Lavin, Alex
    Phoenix, D. Scott
    [J]. SCIENCE, 2017, 358 (6368)
  • [27] Goyal A, 2020, Arxiv, DOI [arXiv:2006.16225, 10.48550/arXiv.2006.16225]
  • [28] Goyal A, 2020, Arxiv, DOI [arXiv:1909.10893, DOI 10.48550/ARXIV.1909.10893, 10.48550/arXiv.1909.10893]
  • [29] Greff K, 2020, Arxiv, DOI arXiv:2012.05208
  • [30] Gregor K, 2015, PR MACH LEARN RES, V37, P1462