Augmented FCN: rethinking context modeling for semantic segmentation

被引:16
作者
Zhang, Dong [1 ]
Zhang, Liyan [2 ]
Tang, Jinhui [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China
基金
中国国家自然科学基金;
关键词
semantic segmentation; context modeling; long-range dependencies; attention mechanism; NETWORK;
D O I
10.1007/s11432-021-3590-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network (AugFCN) by aggregating content- and position-based object contexts for semantic segmentation. Specifically, motivated because each deep feature map is a global, class-wise representation of the input, we first propose an augmented nonlocal interaction (AugNI) to aggregate the global content-based contexts through all feature map interactions. Compared to classical position-wise approaches, AugNI is more efficient. Moreover, to eliminate permutation equivariance and maintain translation equivariance, a learnable, relative position embedding branch is then supportably installed in AugNI to capture the global position-based contexts. AugFCN is built on a fully convolutional network as the backbone by deploying AugNI before the segmentation head network. Experimental results on two challenging benchmarks verify that AugFCN can achieve a competitive 45.38% mIoU (standard mean intersection over union) and 81.9% mIoU on the ADE20K val set and Cityscapes test set, respectively, with little computational overhead. Additionally, the results of the joint implementation of AugNI and existing context modeling schemes show that AugFCN leads to continuous segmentation improvements in state-of-the-art context modeling. We finally achieve a top performance of 45.43% mIoU on the ADE20K val set and 83.0% mIoU on the Cityscapes test set.
引用
收藏
页数:19
相关论文
共 86 条
[21]   Human motion segmentation based on structure constraint matrix factorization [J].
Gao, Hongbo ;
Guo, Fang ;
Zhu, Juping ;
Kan, Zhen ;
Zhang, Xinyu .
SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (01)
[22]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[23]  
Guo MH, 2022, Arxiv, DOI arXiv:2202.09741
[24]   Adaptive Pyramid Context Network for Semantic Segmentation [J].
He, Junjun ;
Deng, Zhongying ;
Zhou, Lei ;
Wang, Yali ;
Qiao, Yu .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7511-7520
[25]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[26]   Strip Pooling: Rethinking Spatial Pooling for Scene Parsing [J].
Hou, Qibin ;
Zhang, Li ;
Cheng, Ming-Ming ;
Feng, Jiashi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4002-4011
[27]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[28]  
Huang C Z A, 2018, P INT C NEURAL INFOR
[29]   Densely Connected Convolutional Networks [J].
Huang, Gao ;
Liu, Zhuang ;
van der Maaten, Laurens ;
Weinberger, Kilian Q. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269
[30]   CCNet: Criss-Cross Attention for Semantic Segmentation [J].
Huang, Zilong ;
Wang, Xinggang ;
Huang, Lichao ;
Huang, Chang ;
Wei, Yunchao ;
Liu, Wenyu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :603-612