Self-attention neural architecture search for semantic image segmentation

被引:42
作者
Fan, Zhenkun [1 ]
Hu, Guosheng [2 ]
Sun, Xin [1 ]
Wang, Gaige [1 ]
Dong, Junyu [1 ]
Su, Chi [3 ]
机构
[1] Ocean Univ China, Dept Comp Sci & Technol, Qingdao 266100, Shandong, Peoples R China
[2] Anyvision, London, England
[3] Kingsoft Cloud, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-attention; Neural architecture search; Semantic segmentation;
D O I
10.1016/j.knosys.2021.107968
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-attention can capture long-distance dependencies and is widely used in semantic segmentation. Existing methods mainly use two kinds of self-attentions, i.e., spatial attention and channel attention, which can capture the relations in HW dimension (image plane, height and width) and C dimension (channels), respectively. Very little research investigates self-attention along other dimensions, which can potentially improve the segmentation performance. In this work, we investigate the self-attentions along all the possible dimensions {H, W, C, HW, HC, CW, HWC}. Then we explore the aggregation of all the possible self-attentions. We apply the neural architecture search (NAS) technique to achieve optimal aggregation. Specifically, we carefully design (1) the search space and (2) the optimization method. For (1), we introduce a building block, a basic self-attention search unit (BSU), which can model self-attentions along all the dimensions. And the search space contains within-BSU and crossBSU operations. In addition, we propose an attention-map splitting method, which can reduce the computations by 1/3. For (2), we apply an efficient differentiable optimization method to search the optimal aggregation. We conduct extensive experiments on Cityscapes and ADE20K datasets. The results show the effectiveness of the proposed method, and we achieve very competitive performance against state-of-the-art methods. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:9
相关论文
共 59 条
[1]   A survey of augmented reality [J].
Azuma, RT .
PRESENCE-VIRTUAL AND AUGMENTED REALITY, 1997, 6 (04) :355-385
[2]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3]  
Bian X., 2016, 2016 ieee winter conference on applications of computer vision (wacv), P1
[4]  
Cai H., 2019, 7 INT C LEARN REPR
[5]  
Chen CR, 2020, AAAI CONF ARTIF INTE, V34, P10510
[6]  
Chen L., 2015, C TRACK P
[7]  
Chen LC, 2018, ADV NEUR IN, V31
[8]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[9]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[10]   Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks [J].
Choi, Sungha ;
Kim, Joanne T. ;
Choo, Jaegul .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9370-9380