Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation

被引：70

作者：

Peng, Chengli ^{[1
]}

Tian, Tian ^{[2
]}

Chen, Chen ^{[3
]}

Guo, Xiaojie ^{[4
]}

Ma, Jiayi ^{[1
]}

机构：

[1] Wuhan Univ, Elect Informat Sch, Wuhan 430072, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China

[3] Univ North Carolina, Dept Elect & Comp Engn, Charlotte, NC 28223 USA

[4] Tianjin Univ, Sch Comp Software, Tianjin 300350, Peoples R China

来源：

NEURAL NETWORKS | 2021年 / 137卷

关键词：

Semantic segmentation; Real time; Deep learning; Attention mechanism; NETWORK;

D O I：

10.1016/j.neunet.2021.01.021

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The encoder-decoder structure has been introduced into semantic segmentation to improve the spatial accuracy of the network by fusing high- and low-level feature maps. However, recent state-of-the-art encoder-decoder-based methods can hardly attain the real-time requirement due to their complex and inefficient decoders. To address this issue, in this paper, we propose a lightweight bilateral attention decoder for real-time semantic segmentation. It consists of two blocks and can fuse different level feature maps via two steps, i.e., information refinement and information fusion. In the first step, we propose a channel attention branch to refine the high-level feature maps and a spatial attention branch for the low-level ones. The refined high-level feature maps can capture more exact semantic information and the refined low-level ones can capture more accurate spatial information, which significantly improves the information capturing ability of these feature maps. In the second step, we develop a new fusion module named pooling fusing block to fuse the refined high- and low-level feature maps. This fusion block can take full advantages of the high- and low-level feature maps, leading to high-quality fusion results. To verify the efficiency of the proposed bilateral attention decoder, we adopt a lightweight network as the backbone and compare our proposed method with other state-of-the-art real-time semantic segmentation methods on the Cityscapes and Camvid datasets. Experimental results demonstrate that our proposed method can achieve better performance with a higher inference speed. Moreover, we compare our proposed network with several state-of-the-art non-real-time semantic segmentation methods and find that our proposed network can also attain better segmentation performance. (C) 2021 Elsevier Ltd. All rights reserved.

引用

页码：188 / 199

页数：12

共 40 条

[1]

[Anonymous], 2016, ENET DEEP NEURAL NET

[2] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[3]

Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473

[4] Semantic object classes in video: A high-definition ground truth database [J].

Brostow, Gabriel J. ;

Fauqueur, Julien ;

Cipolla, Roberto .

PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97

[5] Importance-Aware Semantic Segmentation for Autonomous Vehicles [J].

Chen, Bike ;

Gong, Chen ;

Yang, Jian .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (01) :137-148

[6] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[7]

Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709

[8] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[9] Exploring New Backbone and Attention Module for Semantic Segmentation in Street Scenes [J].

Fan, Lei ;

Wang, Wei-Chien ;

Zha, Fuyuan ;

Yan, Jiapeng .

IEEE ACCESS, 2018, 6 :71566-71580

[10] Dual Attention Network for Scene Segmentation [J].

Fu, Jun ;

Liu, Jing ;

Tian, Haijie ;

Li, Yong ;

Bao, Yongjun ;

Fang, Zhiwei ;

Lu, Hanqing .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149

← 1 2 3 4 →