High-Level Visual Encoding Model Framework with Hierarchical Ventral Stream-Optimized Neural Networks

被引:0
作者
Xiao, Wulue [1 ,2 ]
Li, Jingwei [2 ]
Zhang, Chi [2 ]
Wang, Linyuan [2 ]
Chen, Panpan [2 ]
Yu, Ziya [2 ]
Tong, Li [2 ]
Yan, Bin [2 ]
机构
[1] Zhengzhou Univ, Sch Cyber Sci & Engn, Zhengzhou 450001, Peoples R China
[2] PLA Strateg Support Force Informat Engn Univ, Henan Key Lab Imaging & Intelligent Proc, Zhengzhou 450001, Peoples R China
基金
中国国家自然科学基金;
关键词
fMRI; encoding model; deep neural networks; ventral stream; hierarchical representations; NATURAL IMAGES; REPRESENTATIONS; COMPLEX; REVEAL;
D O I
10.3390/brainsci12081101
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Visual encoding models based on deep neural networks (DNN) show good performance in predicting brain activity in low-level visual areas. However, due to the amount of neural data limitation, DNN-based visual encoding models are difficult to fit for high-level visual areas, resulting in insufficient encoding performance. The ventral stream suggests that higher visual areas receive information from lower visual areas, which is not fully reflected in the current encoding models. In the present study, we propose a novel visual encoding model framework which uses the hierarchy of representations in the ventral stream to improve the model's performance in high-level visual areas. Under the framework, we propose two categories of hierarchical encoding models from the voxel and the feature perspectives to realize the hierarchical representations. From the voxel perspective, we first constructed an encoding model for the low-level visual area (V1 or V2) and extracted the voxel space predicted by the model. Then we use the extracted voxel space of the low-level visual area to predict the voxel space of the high-level visual area (V4 or LO) via constructing a voxel-to-voxel model. From the feature perspective, the feature space of the first model is extracted to predict the voxel space of the high-level visual area. The experimental results show that two categories of hierarchical encoding models effectively improve the encoding performance in V4 and LO. In addition, the proportion of the best-encoded voxels for different models in V4 and LO show that our proposed models have obvious advantages in prediction accuracy. We find that the hierarchy of representations in the ventral stream has a positive effect on improving the performance of the existing model in high-level visual areas.
引用
收藏
页数:15
相关论文
共 49 条
[1]   Model Constrained by Visual Hierarchy Improves Prediction of Neural Responses to Natural Scenes [J].
Antolik, Jan ;
Hofer, Sonja B. ;
Bednar, James A. ;
Mrsic-Flogel, Thomas D. .
PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (06)
[2]   Deep convolutional networks do not classify based on global object shape [J].
Baker, Nicholas ;
Lu, Hongjing ;
Erlikhman, Gennady ;
Kellman, Philip J. .
PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (12)
[3]  
Batty E, 2017, P 5 INT C LEARN REPR
[4]   Nature and origins of the lexicon in 6-mo-olds [J].
Bergelson, Elika ;
Aslin, Richard N. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (49) :12916-12921
[5]   At 6-9 months, human infants know the meanings of many common nouns [J].
Bergelson, Elika ;
Swingley, Daniel .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (09) :3253-3258
[6]   RECOGNITION-BY-COMPONENTS - A THEORY OF HUMAN IMAGE UNDERSTANDING [J].
BIEDERMAN, I .
PSYCHOLOGICAL REVIEW, 1987, 94 (02) :115-147
[7]   Deep convolutional models improve predictions of macaque V1 responses to natural images [J].
Cadena, Santiago A. ;
Denfield, George H. ;
Walker, Edgar Y. ;
Gatys, Leon A. ;
Tolias, Andreas S. ;
Bethge, Matthias ;
Ecker, Alexander S. .
PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (04)
[8]   Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence [J].
Cichy, Radoslaw Martin ;
Khosla, Aditya ;
Pantazis, Dimitrios ;
Torralba, Antonio ;
Oliva, Aude .
SCIENTIFIC REPORTS, 2016, 6
[9]   GaborNet Visual Encoding: A Lightweight Region-Based Visual Encoding Model With Good Expressiveness and Biological Interpretability [J].
Cui, Yibo ;
Qiao, Kai ;
Zhang, Chi ;
Wang, Linyuan ;
Yan, Bin ;
Tong, Li .
FRONTIERS IN NEUROSCIENCE, 2021, 15
[10]   Seeing it all: Convolutional network layers map the function of the human visual system [J].
Eickenberg, Michael ;
Gramfort, Alexandre ;
Varoquaux, Gael ;
Thirion, Bertrand .
NEUROIMAGE, 2017, 152 :184-194