PSCNET: EFFICIENT RGB-D SEMANTIC SEGMENTATION PARALLEL NETWORK BASED ON SPATIAL AND CHANNEL ATTENTION

被引：5

作者：

Du, S. Q. ^{[1
,2
]}

Tang, S. J. ^{[1
,2
]}

Wang, W. X. ^{[1
,2
]}

Li, X. M. ^{[1
,2
]}

Lu, Y. H. ^{[3
]}

Guo, R. Z. ^{[1
,2
]}

机构：

[1] Shenzhen Univ, Res Inst Smart Cities, Sch Architecture & Urban Planning, Shenzhen, Peoples R China

[2] Minist Nat Resources, Key Lab Urban Land Resources Monitoring & Simulat, Shenzhen, Peoples R China

[3] Wuhan Univ, Sch Resource & Environm Sci, Wuhan 430072, Peoples R China

来源：

XXIV ISPRS CONGRESS: IMAGING TODAY, FORESEEING TOMORROW, COMMISSION I | 2022年 / 5-1卷

基金：

中国国家自然科学基金;

关键词：

Deep Learning; Semantic Segmentation; RGB-D Fusion; Channel Attention; Spatial Attention;

D O I：

10.5194/isprs-annals-V-1-2022-129-2022

中图分类号：

P9 [自然地理学];

学科分类号：

0705 ; 070501 ;

摘要：

RGB-D semantic segmentation algorithm is a key technology for indoor semantic map construction. The traditional RGB-D semantic segmentation network, which always suffer from redundant parameters and modules. In this paper, an improved semantic segmentation network PSCNet is designed to reduce redundant parameters and make models easier to implement. Based on the DeepLabv3+ framework, we have improved the original model in three ways, including attention module selection, backbone simplification, and Atrous Spatial Pyramid Pooling (ASPP) module simplification. The research proposes three improvement ideas to address these issues: using spatial-channel co-attention, removing the last module from Depth Backbone, and redesigning WW-ASPP by Depthwise convolution. Compared to Deeplabv3+, the proposed PSCNet are approximately the same number of parameters, but with a 5% improvement in MIoU. Meanwhile, PSCNet achieved inference at a rate of 47 FPS on RTX3090, which is much faster than state-of-the-art semantic segmentation networks.

引用

页码：129 / 136

页数：8

共 21 条

[1]

Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, DOI 10.48550/ARXIV.1706.05587]

[2] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[3] FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture [J].

Hazirbas, Caner ;

Ma, Lingni ;

Domokos, Csaba ;

Cremers, Daniel .

COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 :213-228

[4] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[5]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]

[6]

Hu XX, 2019, IEEE IMAGE PROC, P1440, DOI [10.1109/ICIP.2019.8803025, 10.1109/icip.2019.8803025]

[7]

Jiang JD, 2018, Arxiv, DOI arXiv:1806.01054

[8] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

[9] Gradient-based learning applied to document recognition [J].

Lecun, Y ;

Bottou, L ;

Bengio, Y ;

Haffner, P .

PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324

[10]

Liu Z., 2022, arXiv, DOI 10.48550/arXiv.2201.03545 2201.03545

← 1 2 3 →