Concept whitening for interpretable image recognition

被引:177
作者
Chen, Zhi [1 ]
Bei, Yijie [2 ]
Rudin, Cynthia [1 ,2 ]
机构
[1] Duke Univ, Dept Comp Sci, Durham, NC 27706 USA
[2] Duke Univ, Dept Elect & Comp Engn, Durham, NC USA
基金
美国国家科学基金会;
关键词
Convolutional neural networks - Network layers - Multilayer neural networks;
D O I
10.1038/s42256-020-00265-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
What does a neural network encode about a concept as we traverse through the layers? Interpretability in machine learning is undoubtedly important, but the calculations of neural networks are very challenging to understand. Attempts to see inside their hidden layers can be misleading, unusable or rely on the latent space to possess properties that it may not have. Here, rather than attempting to analyse a neural network post hoc, we introduce a mechanism, called concept whitening (CW), to alter a given layer of the network to allow us to better understand the computation leading up to that layer. When a concept whitening module is added to a convolutional neural network, the latent space is whitened (that is, decorrelated and normalized) and the axes of the latent space are aligned with known concepts of interest. By experiment, we show that CW can provide us with a much clearer understanding of how the network gradually learns concepts over layers. CW is an alternative to a batch normalization layer in that it normalizes, and also decorrelates (whitens), the latent space. CW can be used in any layer of the network without hurting predictive performance. There is much interest in 'explainable' AI, but most efforts concern post hoc methods. Instead, a neural network can be made inherently interpretable, with an approach that involves making human-understandable concepts (aeroplane, bed, lamp and so on) align along the axes of its latent space.
引用
收藏
页码:772 / 782
页数:12
相关论文
共 50 条
[41]  
Vorontsov E, 2017, PR MACH LEARN RES, V70
[42]   A feasible method for optimization with orthogonality constraints [J].
Wen, Zaiwen ;
Yin, Wotao .
MATHEMATICAL PROGRAMMING, 2013, 142 (1-2) :397-434
[43]  
Wisdom S, 2016, ADV NEUR IN, V29
[44]   Towards Interpretable Object Detection by Unfolding Latent Structures [J].
Wu, Tianfu ;
Song, Xi .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6032-6042
[45]  
Yeh C.-K., 2019, On concept-based explanations in deep neural networks
[46]   Visualizing and Understanding Convolutional Networks [J].
Zeiler, Matthew D. ;
Fergus, Rob .
COMPUTER VISION - ECCV 2014, PT I, 2014, 8689 :818-833
[47]  
Zhang Q., 2019, P AAAI C ART INT WOR
[48]   Interpretable Basis Decomposition for Visual Explanation [J].
Zhou, Bolei ;
Sun, Yiyou ;
Bau, David ;
Torralba, Antonio .
COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 :122-138
[49]   Interpreting Deep Visual Representations via Network Dissection [J].
Zhou, Bolei ;
Bau, David ;
Oliva, Aude ;
Torralba, Antonio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (09) :2131-2145
[50]   Places: A 10 Million Image Database for Scene Recognition [J].
Zhou, Bolei ;
Lapedriza, Agata ;
Khosla, Aditya ;
Oliva, Aude ;
Torralba, Antonio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (06) :1452-1464