Deep learning models fail to capture the configural nature of human shape perception

被引:28
作者
Baker, Nicholas [1 ]
Elder, James H. [2 ]
机构
[1] Loyola Univ, Dept Psychol, Chicago, IL 60660 USA
[2] York Univ, Ctr Vis Res, Toronto, ON M3J 1P3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
GESTALT PSYCHOLOGY; LOCAL FEATURES; FACE; INFORMATION; WHOLES; PARTS; CLASSIFICATION; RECOGNITION; INVERSION;
D O I
10.1016/j.isci.2022.104913
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A hallmark of human object perception is sensitivity to the holistic configuration of the local shape features of an object. Deep convolutional neural networks (DCNNs) are currently the dominant models for object recognition processing in the visual cortex, but do they capture this configural sensitivity? To answer this question, we employed a dataset of animal silhouettes and created a variant of this dataset that disrupts the configuration of each object while preserving local features. While human performance was impacted by this manipulation, DCNN performance was not, indicating insensitivity to object configuration. Modifications to training and architecture to make networks more brain-like did not lead to configural processing, and none of the networkswere able to accurately predict trial-by-trial human object judgements. We speculate that tomatch human configural sensitivity, networks must be trained to solve a broader range of object tasks beyond category recognition.
引用
收藏
页数:16
相关论文
共 70 条
[1]   Local features and global shape information in object classification by deep convolutional neural networks [J].
Baker, Nicholas ;
Lu, Hongjing ;
Erlikhman, Gennady ;
Kellman, Philip J. .
VISION RESEARCH, 2020, 172 :46-61
[2]   Deep convolutional networks do not classify based on global object shape [J].
Baker, Nicholas ;
Lu, Hongjing ;
Erlikhman, Gennady ;
Kellman, Philip J. .
PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (12)
[3]   Abstract Shape Representation in Human Visual Perception [J].
Baker, Nicholas ;
Kellman, Philip J. .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2018, 147 (09) :1295-1308
[4]  
Bethge M, 2019, Arxiv, DOI [arXiv:1904.00760, DOI 10.48550/ARXIV.1904.00760]
[5]   SURFACE VERSUS EDGE-BASED DETERMINANTS OF VISUAL RECOGNITION [J].
BIEDERMAN, I ;
JU, G .
COGNITIVE PSYCHOLOGY, 1988, 20 (01) :38-64
[6]   Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition [J].
Cadieu, Charles F. ;
Hong, Ha ;
Yamins, Daniel L. K. ;
Pinto, Nicolas ;
Ardila, Diego ;
Solomon, Ethan A. ;
Majaj, Najib J. ;
DiCarlo, James J. .
PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (12)
[7]  
Cavanagh P, 1991, Representations of Vision: Trends and Tacit Assumptions in Vision Research, P295
[8]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46
[9]  
Dai Z, 2021, ADV NEUR IN, V34
[10]  
Dosovitskiy A., 2020, arXiv, DOI DOI 10.48550/ARXIV