Delving deep into spatial pooling for squeeze-and-excitation networks

被引：126

作者：

Jin, Xin ^{[4
]}

Xie, Yanping ^{[1
,2
,5
]}

Wei, Xiu-Shen ^{[1
,2
,3
]}

Zhao, Bo-Rui ^{[4
]}

Chen, Zhao-Min ^{[4
]}

Tan, Xiaoyang ^{[5
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Key Lab Intelligent Percept & Syst High Dimens In, PCA Lab,Minist Educ, Nanjing, Peoples R China

[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Jiangsu Key Lab Image & Video Understanding Socia, Nanjing, Peoples R China

[3] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China

[4] Megvii Technol, Megvii Res Nanjing, Nanjing, Peoples R China

[5] Nanjing Univ Aeronaut & Astronaut, Dept Comp Sci & Technol, Nanjing, Peoples R China

来源：

PATTERN RECOGNITION | 2022年 / 121卷

关键词：

Convolutional neural networks; Squeeze-and-excitation; Spatial pooling; Base model; IMAGE; CLASSIFICATION; ATTENTION;

D O I：

10.1016/j.patcog.2021.108159

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Squeeze-and-Excitation (SE) blocks have demonstrated significant accuracy gains for state-of-the-art deep architectures by re-weighting channel-wise feature responses. The SE block is an architecture unit that integrates two operations: a squeeze operation that employs global average pooling to aggregate spatial convolutional features into a channel feature, and an excitation operation that learns instance-specific channel weights from the squeezed feature to re-weight each channel. In this paper, we revisit the squeeze operation in SE blocks, and shed lights on why and how to embed rich (both global and lo -cal ) information into the excitation module at minimal extra costs. In particular, we introduce a simple but effective two-stage spatial pooling process: rich descriptor extraction and information fusion . The rich descriptor extraction step aims to obtain a set of diverse (i.e., global and especially local) deep descrip-tors that contain more informative cues than global average-pooling. While, absorbing more information delivered by these descriptors via a fusion step can aid the excitation operation to return more accu-rate re-weight scores in a data-driven manner. We validate the effectiveness of our method by extensive experiments on ImageNet for image classification and on MS-COCO for object detection and instance seg-mentation. For these experiments, our method achieves consistent improvements over the SENets on all tasks, in some cases, by a large margin. (c) 2021 Published by Elsevier Ltd.

引用

页数：12

共 54 条

[1] NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps [J].

Aimar, Alessandro ;

Mostafa, Hesham ;

Calabrese, Enrico ;

Rios-Navarro, Antonio ;

Tapiador-Morales, Ricardo ;

Lungu, Iulia-Alexandra ;

Milde, Moritz B. ;

Corradi, Federico ;

Linares-Barranco, Alejandro ;

Liu, Shih-Chii ;

Delbruck, Tobi .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :644-656

[2] Network Dissection: Quantifying Interpretability of Deep Visual Representations [J].

Bau, David ;

Zhou, Bolei ;

Khosla, Aditya ;

Oliva, Aude ;

Torralba, Antonio .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3319-3327

[3] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[4]

Bluche T, 2016, ADV NEUR IN, V29

[5] Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks [J].

Cao, Chunshui ;

Liu, Xianming ;

Yang, Yi ;

Yu, Yinan ;

Wang, Jiang ;

Wang, Zilei ;

Huang, Yongzhen ;

Wang, Liang ;

Huang, Chang ;

Xu, Wei ;

Ramanan, Deva ;

Huang, Thomas S. .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2956-2964

[6] Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks [J].

Cheng, Jian ;

Wu, Jiaxiang ;

Leng, Cong ;

Wang, Yuhang ;

Hu, Qinghao .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) :4730-4743

[7] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[8] Low-Complexity Approximate Convolutional Neural Networks [J].

Cintra, Renato J. ;

Duffner, Stefan ;

Garcia, Christophe ;

Leite, Andre .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) :5981-5992

[9] Control of goal-directed and stimulus-driven attention in the brain [J].

Corbetta, M ;

Shulman, GL .

NATURE REVIEWS NEUROSCIENCE, 2002, 3 (03) :201-215

[10] Stacked Convolutional Denoising Auto-Encoders for Feature Representation [J].

Du, Bo ;

Xiong, Wei ;

Wu, Jia ;

Zhang, Lefei ;

Zhang, Liangpei ;

Tao, Dacheng .

IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (04) :1017-1027

← 1 2 3 4 5 6 →