Deep learning models fail to capture the configural nature of human shape perception

被引:28
作者
Baker, Nicholas [1 ]
Elder, James H. [2 ]
机构
[1] Loyola Univ, Dept Psychol, Chicago, IL 60660 USA
[2] York Univ, Ctr Vis Res, Toronto, ON M3J 1P3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
GESTALT PSYCHOLOGY; LOCAL FEATURES; FACE; INFORMATION; WHOLES; PARTS; CLASSIFICATION; RECOGNITION; INVERSION;
D O I
10.1016/j.isci.2022.104913
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A hallmark of human object perception is sensitivity to the holistic configuration of the local shape features of an object. Deep convolutional neural networks (DCNNs) are currently the dominant models for object recognition processing in the visual cortex, but do they capture this configural sensitivity? To answer this question, we employed a dataset of animal silhouettes and created a variant of this dataset that disrupts the configuration of each object while preserving local features. While human performance was impacted by this manipulation, DCNN performance was not, indicating insensitivity to object configuration. Modifications to training and architecture to make networks more brain-like did not lead to configural processing, and none of the networkswere able to accurately predict trial-by-trial human object judgements. We speculate that tomatch human configural sensitivity, networks must be trained to solve a broader range of object tasks beyond category recognition.
引用
收藏
页数:16
相关论文
共 70 条
[11]   Recurrent Processing in the Formation of Shape Percepts [J].
Drewes, Jan ;
Goren, Galina ;
Zhu, Weina ;
Elder, James H. .
JOURNAL OF NEUROSCIENCE, 2016, 36 (01) :185-192
[12]   A MEASURE OF CLOSURE [J].
ELDER, J ;
ZUCKER, S .
VISION RESEARCH, 1994, 34 (24) :3361-3369
[13]   THE EFFECT OF CONTOUR CLOSURE ON THE RAPID DISCRIMINATION OF 2-DIMENSIONAL SHAPES [J].
ELDER, J ;
ZUCKER, S .
VISION RESEARCH, 1993, 33 (07) :981-991
[14]   Ecological statistics of Gestalt laws for the perceptual organization of contours [J].
Elder, James H. ;
Goldberg, Richard M. .
JOURNAL OF VISION, 2002, 2 (04) :324-353
[15]   The role of global cues in the perceptual grouping of natural shapes [J].
Elder, James H. ;
Oleskiw, Timothy D. ;
Fruend, Ingo .
JOURNAL OF VISION, 2018, 18 (12) :1-21
[16]   Shape from Contour: Computation and Representation [J].
Elder, James H. .
ANNUAL REVIEW OF VISION SCIENCE, VOL 4, 2018, 4 :423-450
[17]   Cue dynamics underlying rapid detection of animals in natural scenes [J].
Elder, James H. ;
Velisavljevic, Ljiljana .
JOURNAL OF VISION, 2009, 9 (07)
[18]   StyleNet: Generating Attractive Visual Captions with Styles [J].
Gan, Chuang ;
Gan, Zhe ;
He, Xiaodong ;
Gao, Jianfeng ;
Deng, Li .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :955-964
[19]   Res2Net: A New Multi-Scale Backbone Architecture [J].
Gao, Shang-Hua ;
Cheng, Ming-Ming ;
Zhao, Kai ;
Zhang, Xin-Yu ;
Yang, Ming-Hsuan ;
Torr, Philip .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) :652-662
[20]   Unraveling mechanisms for expert object recognition: Bridging brain activity and behavior [J].
Gauthier, I ;
Tarr, MJ .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2002, 28 (02) :431-446