Augmented FCN: rethinking context modeling for semantic segmentation

被引：14

作者：

Zhang, Dong ^{[1
]}

Zhang, Liyan ^{[2
]}

Tang, Jinhui ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

[2] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2023年 / 66卷 / 04期

基金：

中国国家自然科学基金;

关键词：

semantic segmentation; context modeling; long-range dependencies; attention mechanism; NETWORK;

D O I：

10.1007/s11432-021-3590-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network (AugFCN) by aggregating content- and position-based object contexts for semantic segmentation. Specifically, motivated because each deep feature map is a global, class-wise representation of the input, we first propose an augmented nonlocal interaction (AugNI) to aggregate the global content-based contexts through all feature map interactions. Compared to classical position-wise approaches, AugNI is more efficient. Moreover, to eliminate permutation equivariance and maintain translation equivariance, a learnable, relative position embedding branch is then supportably installed in AugNI to capture the global position-based contexts. AugFCN is built on a fully convolutional network as the backbone by deploying AugNI before the segmentation head network. Experimental results on two challenging benchmarks verify that AugFCN can achieve a competitive 45.38% mIoU (standard mean intersection over union) and 81.9% mIoU on the ADE20K val set and Cityscapes test set, respectively, with little computational overhead. Additionally, the results of the joint implementation of AugNI and existing context modeling schemes show that AugFCN leads to continuous segmentation improvements in state-of-the-art context modeling. We finally achieve a top performance of 45.43% mIoU on the ADE20K val set and 83.0% mIoU on the Cityscapes test set.

引用

页数：19

共 86 条

[1] Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes [J].

Abu Alhaija, Hassan ;

Mustikovela, Siva Karthik ;

Mescheder, Lars ;

Geiger, Andreas ;

Rother, Carsten .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (09) :961-972

[2] Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations [J].

Ahn, Jiwoon ;

Cho, Sunghyun ;

Kwak, Suha .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2204-2213

[3]

Albawi S, 2017, I C ENG TECHNOL

[4]

[Anonymous], 2016, P IEEE CVPR

[5] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[6]

Bello I., 2021, P INT C LEARNING REP

[7] Attention Augmented Convolutional Networks [J].

Bello, Irwan ;

Zoph, Barret ;

Vaswani, Ashish ;

Shlens, Jonathon ;

Le, Quoc V. .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294

[8]

Carion N., 2020, EUROPEAN C COMPUTER

[9]

Chen LC, 2017, Arxiv, DOI arXiv:1706.05587

[10] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

← 1 2 3 4 5 6 7 8 9 →