Gaussian Dynamic Convolution for Efficient Single-Image Segmentation

被引:43
作者
Sun, Xin [1 ,2 ]
Chen, Changrui [3 ]
Wang, Xiaorui [1 ]
Dong, Junyu [1 ]
Zhou, Huiyu [4 ]
Chen, Sheng [5 ]
机构
[1] Ocean Univ China, Dept Comp Sci & Technol, Qingdao 266100, Shandong, Peoples R China
[2] Tech Univ Munich, Dept Aerosp & Geodesy, D-80333 Munich, Germany
[3] Univ Warwick, WMG Data Sci, Coventry CV4 7AL, W Midlands, England
[4] Univ Leicester, Dept Informat, Leicester LE1 7RH, Leics, England
[5] Univ Southampton, Sch Elect & Comp Sci, Southampton SO17 1BJ, Hants, England
基金
中国国家自然科学基金;
关键词
Image segmentation; Convolution; Task analysis; Semantics; Feature extraction; Training; Kernel; convolutional neural networks; weakly supervised learning; dynamic receptive field; SEMANTIC SEGMENTATION;
D O I
10.1109/TCSVT.2021.3096814
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Interactive single-image segmentation is ubiquitous in the scientific and commercial imaging software. Lightweight neural network is one practical and effective way to accomplish the single-image segmentation task. This work focuses on the single-image segmentation problem only with some seeds such as scribbles. Inspired by the dynamic receptive field in the human being's visual system, we propose the Gaussian dynamic convolution (GDC) to fast and efficiently aggregate the contextual information for neural networks. The core idea is randomly selecting the spatial sampling area according to the Gaussian distribution offsets. Our GDC can be easily used as a module to build lightweight or complex segmentation networks. We adopt the proposed GDC to address the typical single-image segmentation tasks. Furthermore, we also build a Gaussian dynamic pyramid Pooling to show its potential and generality in common semantic segmentation. Experiments demonstrate that the GDC outperforms other existing convolutions on three benchmark segmentation datasets including Pascal-Context, Pascal-VOC 2012, and Cityscapes. Additional experiments are also conducted to illustrate that the GDC can produce richer and more vivid features compared with other convolutions. In general, our GDC is conducive to the convolutional neural networks to form an overall impression of the image.
引用
收藏
页码:2937 / 2948
页数:12
相关论文
共 66 条
[1]   SLIC Superpixels Compared to State-of-the-Art Superpixel Methods [J].
Achanta, Radhakrishna ;
Shaji, Appu ;
Smith, Kevin ;
Lucchi, Aurelien ;
Fua, Pascal ;
Suesstrunk, Sabine .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2274-2281
[2]   Learning Pixel-level Semantic Affinity with Image-level Supervision forWeakly Supervised Semantic Segmentation [J].
Ahn, Jiwoon ;
Kwak, Suha .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4981-4990
[3]  
[Anonymous], 2013, P 2013 IEEE INT C CO, DOI DOI 10.1109/ICCV.2013.222
[4]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[5]   What's the Point: Semantic Segmentation with Point Supervision [J].
Bearman, Amy ;
Russakovsky, Olga ;
Ferrari, Vittorio ;
Fei-Fei, Li .
COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 :549-565
[6]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[7]   Fast approximate energy minimization via graph cuts [J].
Boykov, Y ;
Veksler, O ;
Zabih, R .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (11) :1222-1239
[8]  
Chen CR, 2020, AAAI CONF ARTIF INTE, V34, P10510
[9]   SwipeCut: Interactive Segmentation via Seed Grouping [J].
Chen, Ding-Jie ;
Chen, Hwann-Tzong ;
Chang, Long-Wen .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) :2959-2970
[10]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851