A Strip Dilated Convolutional Network for Semantic Segmentation

被引：7

作者：

Zhou, Yan ^{[1
]}

Zheng, Xihong ^{[1
]}

Ouyang, Wanli ^{[2
]}

Li, Baopu ^{[3
]}

机构：

[1] Xiangtan Univ, Sch Automat & Elect Informat, Xiangtan 411105, Peoples R China

[2] Univ Sydney, Sch Elect & Informat, Camperdown, NSW 2006, Australia

[3] Baidu Res USA, Sunnyvale, CA 94089 USA

来源：

NEURAL PROCESSING LETTERS | 2023年 / 55卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Semantic segmentation; Multi-scale contexts; Encoder-decoder; Multi-scale strip pooling module; Strip dilated convolution module; ATTENTION;

D O I：

10.1007/s11063-022-11048-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There are frequently a large number of strip objects in segmentation scenarios, and the use of conventional square convolution may yield redundant information. Based on our previously proposed SA-FFNet (Zhou et al. in Neurocomputing 453:50-59, 2021), we study the effect of strip sub-region information extraction on semantic segmentation and propose a network. Our method is conducive to extracting multi-scale strip objects that often appear in segmentation scenes, and using strip dilated convolution to further extract contextual dependencies in other directions. First, we propose a multi-scale strip pooling module that enables the backbone network to effectively obtain multi-scale contexts; Then, we introduce a strip dilated convolution module, which supplements the vertical contexts of the strip pooling by using strip dilated convolution; Finally, we construct a novel network integrating the proposed two modules. The method explicitly takes horizontal and vertical contexts of multi-scale strip objects into consideration, so that scene understanding could benefit from long-range dependencies. The experimental results on the widely used PASCAL VOC 2012 and Cityscapes scene analysis benchmark datasets, which are better than the existing OCRNet, DeeplabV3+, SPNet, etc, both qualitatively and quantitatively.

引用

页码：4439 / 4459

页数：21

共 53 条

[1] Weakly supervised semantic segmentation by iteratively refining optimal segmentation with deep cues guidance [J].

Al-Huda, Zaid ;

Peng, Bo ;

Yang, Yan ;

Algburi, Riyadh Nazar Ali ;

Ahmad, Muqeet ;

Khurshid, Faisal ;

Moghalles, Khaled .

NEURAL COMPUTING & APPLICATIONS, 2021, 33 (15) :9035-9060

[2]

[Anonymous], 2015, CVPR Workshop on the Future of Datasets in Vision

[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[4] Information aggregation and fusion in deep neural networks for object interaction exploration for semantic segmentation [J].

Bai, Shuang ;

Wang, Congcong .

KNOWLEDGE-BASED SYSTEMS, 2021, 218

[5] Depth-Quality-Aware Salient Object Detection [J].

Chen, Chenglizhao ;

Wei, Jipeng ;

Peng, Chong ;

Qin, Hong .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :2350-2363

[6] Improved Saliency Detection in RGB-D Images Using Two-Phase Depth Estimation and Selective Deep Fusion [J].

Chen, Chenglizhao ;

Wei, Jipeng ;

Peng, Chong ;

Zhang, Weizhong ;

Qin, Hong .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :4296-4307

[7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[9]

Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709

[10] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

← 1 2 3 4 5 6 →