Expectation-Maximization Attention Networks for Semantic Segmentation

被引:565
作者
Li, Xia [1 ,2 ]
Zhong, Zhisheng [2 ]
Wu, Jianlong [2 ,3 ]
Yang, Yibo [4 ]
Lin, Zhouchen [2 ]
Liu, Hong [1 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Key Lab Machine Percept, Beijing, Peoples R China
[2] Peking Univ, Key Lab Machine Percept MOE, Sch EECS, Beijing, Peoples R China
[3] Shandong Univ, Sch Comp Sci & Technol, Jinan, Peoples R China
[4] Peking Univ, Acad Adv Interdisciplinary Studies, Beijing, Peoples R China
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV.2019.00926
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-attention mechanism has been widely used for various tasks. It is designed to compute the representation of each position by a weighted sum of the features at all positions. Thus, it can capture long-range relations for computer vision tasks. However, it is computationally consuming. Since the attention maps are computed w.r.t all other positions. In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to stabilize its training procedure. We conduct extensive experiments on popular semantic segmentation benchmarks including PASCAL VOC, PAS-CAL Context and COCO Stuff, on which we set new records(1).
引用
收藏
页码:9166 / 9175
页数:10
相关论文
共 38 条
[1]  
Ba J. L., 2016, Layer Normalization, DOI 10.48550/arXiv.1607.06450
[2]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3]   COCO-Stuff: Thing and Stuff Classes in Context [J].
Caesar, Holger ;
Uijlings, Jasper ;
Ferrari, Vittorio .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1209-1218
[4]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[5]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[6]  
Chen YP, 2018, ADV NEUR IN, V31
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]   Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation [J].
Ding, Henghui ;
Jiang, Xudong ;
Shuai, Bing ;
Liu, Ai Qun ;
Wang, Gang .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2393-2402
[9]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[10]  
FORGY EW, 1965, BIOMETRICS, V21, P768