Detail-Preserving Pooling in Deep Networks

被引:88
作者
Saeedan, Faraz [1 ]
Weber, Nicolas [1 ,2 ]
Goesele, Michael [1 ,3 ]
Roth, Stefan [1 ]
机构
[1] Tech Univ Darmstadt, Darmstadt, Germany
[2] NEC Labs Europe, Heidelberg, Germany
[3] Oculus Res, Garner, NC USA
来源
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年
关键词
D O I
10.1109/CVPR.2018.00949
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most convolutional neural networks use some method for gradually downscaling the size of the hidden layers. This is commonly referred to as pooling, and is applied to reduce the number of parameters, improve invariance to certain distortions, and increase the receptive field size. Since pooling by nature is a lossy process, it is crucial that each such layer maintains the portion of the activations that is most important for the network's discriminability. Yet, simple maximization or averaging over blocks, max or average pooling, or plain downsampling in the form of strided convolutions are the standard. In this paper, we aim to leverage recent results on image downscaling for the purposes of deep learning. Inspired by the human visual system, which focuses on local spatial changes, we propose detail-preserving pooling (DPP), an adaptive pooling method that magnifies spatial changes and preserves important structural detail. Importantly, its parameters can be learned jointly with the rest of the network. We analyze some of its theoretical properties and show its empirical benefits on several datasets and networks, where DPP consistently outperforms previous pooling approaches.
引用
收藏
页码:9108 / 9116
页数:9
相关论文
共 34 条
[21]   Matrix Backpropagation for Deep Networks with Structured Layers [J].
Ionescu, Catalin ;
Vantzos, Orestis ;
Sminchisescu, Cristian .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2965-2973
[22]   What is the Best Multi-Stage Architecture for Object Recognition? [J].
Jarrett, Kevin ;
Kavukcuoglu, Koray ;
Ranzato, Marc'Aurelio ;
LeCun, Yann .
2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, :2146-2153
[23]  
Krizhevsky A., 2009, LEARNING MULTIPLE LA
[24]   Gradient-based learning applied to document recognition [J].
Lecun, Y ;
Bottou, L ;
Bengio, Y ;
Haffner, P .
PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324
[25]  
Lee CY, 2016, JMLR WORKSH CONF PRO, V51, P464
[26]   Distinctive image features from scale-invariant keypoints [J].
Lowe, DG .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 60 (02) :91-110
[27]   ImageNet Large Scale Visual Recognition Challenge [J].
Russakovsky, Olga ;
Deng, Jia ;
Su, Hao ;
Krause, Jonathan ;
Satheesh, Sanjeev ;
Ma, Sean ;
Huang, Zhiheng ;
Karpathy, Andrej ;
Khosla, Aditya ;
Bernstein, Michael ;
Berg, Alexander C. ;
Fei-Fei, Li .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) :211-252
[28]  
Sermanet P, 2012, INT C PATT RECOG, P3288
[29]  
Simonyan K., 2014, ICLR
[30]   A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them [J].
Sun, Deqing ;
Roth, Stefan ;
Black, Michael J. .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 106 (02) :115-137