A too-good-to-be-true prior to reduce shortcut reliance

被引:13
作者
Dagaev, Nikolay [1 ,2 ]
Roads, Brett D. [3 ]
Luo, Xiaoliang [3 ]
Barry, Daniel N. [3 ]
Patil, Kaustubh R. [4 ,5 ]
Love, Bradley C. [3 ,6 ]
机构
[1] HSE Univ, Sch Psychol, Moscow, Russia
[2] UCL, Dept Comp Sci, London, England
[3] UCL, Dept Expt Psychol, London, England
[4] Res Ctr Julich, Inst Neurosci & Med Brain & Behav INM 7, Julich, Germany
[5] Heinrich Heine Univ Dusseldorf, Inst Syst Neurosci, Med Fac, Dusseldorf, Germany
[6] Alan Turing Inst, London, England
基金
英国惠康基金;
关键词
Shortcut learning; Out-of-distribution generalization; Robustness; Deep learning;
D O I
10.1016/j.patrec.2022.12.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite their impressive performance in object recognition and other tasks under standard testing conditions, deep networks often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on aeshortcutsg superficial features that correlate with categories without capturing deeper invariants that hold across contexts. Real-world concepts often possess a complex structure that can vary superficially across contexts, which can make the most intuitive and promising solutions in one context not generalize to others. One potential way to improve o.o.d. generalization is to assume simple solutions are unlikely to be valid across contexts and avoid them, which we refer to as the too-good-to-be-true prior. A low-capacity network (LCN) with a shallow architecture should only be able to learn surface relationships, including shortcuts. We find that LCNs can serve as shortcut detectors. Furthermore, an LCN's predictions can be used in a two-stage approach to encourage a high-capacity network (HCN) to rely on deeper invariant features that should generalize broadly. In particular, items that the LCN can master are downweighted when training the HCN. Using a modified version of the CIFAR-10 dataset in which we introduced shortcuts, we found that the two-stage LCN-HCN approach reduced reliance on shortcuts and facilitated o.o.d. generalization. (c) 2022 Published by Elsevier B.V.
引用
收藏
页码:164 / 171
页数:8
相关论文
共 42 条
[1]  
Bahng H, 2020, PR MACH LEARN RES, V119
[2]   Recognition in Terra Incognita [J].
Beery, Sara ;
Van Horn, Grant ;
Perona, Pietro .
COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :472-489
[3]  
Bengio Y., 2009, P 26 ANN INT C MACHI, P41, DOI DOI 10.1145/1553374.1553380
[4]  
Bethge M, 2019, Arxiv, DOI arXiv:1904.00760
[5]  
Bishop Christopher M., 2006, Pattern recognition and machine learning, DOI [10.1007/978-0-387-45528-0, DOI 10.1007/978-0-387-45528-0]
[6]  
Cadene R, 2020, Arxiv, DOI arXiv:1906.10169
[7]  
Clark C, 2020, Arxiv, DOI arXiv:2011.03856
[8]  
Clark C, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P4069
[9]  
Geirhos R., 2019, NEURIPS 2018, P7549
[10]  
Geirhos R, 2019, Arxiv, DOI arXiv:1811.12231