Squeeze-and-Excitation Networks

被引:544
作者
Hu, Jie [1 ,2 ,3 ]
Shen, Li [7 ]
Albanie, Samuel [7 ]
Sun, Gang [3 ,6 ]
Wu, Enhua [1 ,2 ,4 ,5 ]
机构
[1] Chinese Acad Sci, Inst Software, State Key Lab Comp Sci, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Momenta, Dongsheng Plaza A,8 Zhongguancun East Rd, Beijing 100083, Peoples R China
[4] Univ Macau, Fac Sci & Technol, Taipa, Macao, Peoples R China
[5] Univ Macau, AI Ctr, Taipa, Macao, Peoples R China
[6] Chinese Acad Sci, Inst Automat, LIAMA NLPR, Beijing 100190, Peoples R China
[7] Univ Oxford, Visual Geometry Grp, Oxford OX1 2JD, England
基金
国家重点研发计划; 英国工程与自然科学研究理事会;
关键词
Squeeze-and-excitation; image representations; attention; convolutional neural networks; VISUAL-ATTENTION; MODEL;
D O I
10.1109/TPAMI.2019.2913372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251 percent, surpassing the winning entry of 2016 by a relative improvement of similar to 25 percent. Models and code are available at https://github.com/hujie-frank/SENet.
引用
收藏
页码:2011 / 2023
页数:13
相关论文
共 84 条
[1]  
[Anonymous], PLACES401 PLACES365
[2]  
[Anonymous], 2018, P IEEE C COMP VIS PA
[3]  
Baker B., 2017, INT C LEARN REPR
[4]  
Baker Bowen, 2018, ating neural architecture search using performance prediction
[5]  
Bayer J, 2009, LECT NOTES COMPUT SC, V5769, P755, DOI 10.1007/978-3-642-04277-5_76
[6]   Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].
Bell, Sean ;
Zitnick, C. Lawrence ;
Bala, Kavita ;
Girshick, Ross .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883
[7]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[8]  
Brock A., 2018, ICLR
[9]   Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks [J].
Cao, Chunshui ;
Liu, Xianming ;
Yang, Yi ;
Yu, Yinan ;
Wang, Jiang ;
Wang, Zilei ;
Huang, Yongzhen ;
Wang, Liang ;
Huang, Chang ;
Xu, Wei ;
Ramanan, Deva ;
Huang, Thomas S. .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2956-2964
[10]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306