Delving deep into spatial pooling for squeeze-and-excitation networks

被引:126
作者
Jin, Xin [4 ]
Xie, Yanping [1 ,2 ,5 ]
Wei, Xiu-Shen [1 ,2 ,3 ]
Zhao, Bo-Rui [4 ]
Chen, Zhao-Min [4 ]
Tan, Xiaoyang [5 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, PCA Lab,Minist Educ, Nanjing, Peoples R China
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Jiangsu Key Lab Image & Video Understanding Socia, Nanjing, Peoples R China
[3] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[4] Megvii Technol, Megvii Res Nanjing, Nanjing, Peoples R China
[5] Nanjing Univ Aeronaut & Astronaut, Dept Comp Sci & Technol, Nanjing, Peoples R China
关键词
Convolutional neural networks; Squeeze-and-excitation; Spatial pooling; Base model; IMAGE; CLASSIFICATION; ATTENTION;
D O I
10.1016/j.patcog.2021.108159
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Squeeze-and-Excitation (SE) blocks have demonstrated significant accuracy gains for state-of-the-art deep architectures by re-weighting channel-wise feature responses. The SE block is an architecture unit that integrates two operations: a squeeze operation that employs global average pooling to aggregate spatial convolutional features into a channel feature, and an excitation operation that learns instance-specific channel weights from the squeezed feature to re-weight each channel. In this paper, we revisit the squeeze operation in SE blocks, and shed lights on why and how to embed rich (both global and lo -cal ) information into the excitation module at minimal extra costs. In particular, we introduce a simple but effective two-stage spatial pooling process: rich descriptor extraction and information fusion . The rich descriptor extraction step aims to obtain a set of diverse (i.e., global and especially local) deep descrip-tors that contain more informative cues than global average-pooling. While, absorbing more information delivered by these descriptors via a fusion step can aid the excitation operation to return more accu-rate re-weight scores in a data-driven manner. We validate the effectiveness of our method by extensive experiments on ImageNet for image classification and on MS-COCO for object detection and instance seg-mentation. For these experiments, our method achieves consistent improvements over the SENets on all tasks, in some cases, by a large margin. (c) 2021 Published by Elsevier Ltd.
引用
收藏
页数:12
相关论文
共 54 条
[1]   NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps [J].
Aimar, Alessandro ;
Mostafa, Hesham ;
Calabrese, Enrico ;
Rios-Navarro, Antonio ;
Tapiador-Morales, Ricardo ;
Lungu, Iulia-Alexandra ;
Milde, Moritz B. ;
Corradi, Federico ;
Linares-Barranco, Alejandro ;
Liu, Shih-Chii ;
Delbruck, Tobi .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :644-656
[2]   Network Dissection: Quantifying Interpretability of Deep Visual Representations [J].
Bau, David ;
Zhou, Bolei ;
Khosla, Aditya ;
Oliva, Aude ;
Torralba, Antonio .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3319-3327
[3]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[4]  
Bluche T, 2016, ADV NEUR IN, V29
[5]   Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks [J].
Cao, Chunshui ;
Liu, Xianming ;
Yang, Yi ;
Yu, Yinan ;
Wang, Jiang ;
Wang, Zilei ;
Huang, Yongzhen ;
Wang, Liang ;
Huang, Chang ;
Xu, Wei ;
Ramanan, Deva ;
Huang, Thomas S. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2956-2964
[6]   Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks [J].
Cheng, Jian ;
Wu, Jiaxiang ;
Leng, Cong ;
Wang, Yuhang ;
Hu, Qinghao .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) :4730-4743
[7]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[8]   Low-Complexity Approximate Convolutional Neural Networks [J].
Cintra, Renato J. ;
Duffner, Stefan ;
Garcia, Christophe ;
Leite, Andre .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) :5981-5992
[9]   Control of goal-directed and stimulus-driven attention in the brain [J].
Corbetta, M ;
Shulman, GL .
NATURE REVIEWS NEUROSCIENCE, 2002, 3 (03) :201-215
[10]   Stacked Convolutional Denoising Auto-Encoders for Feature Representation [J].
Du, Bo ;
Xiong, Wei ;
Wu, Jia ;
Zhang, Lefei ;
Zhang, Liangpei ;
Tao, Dacheng .
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (04) :1017-1027