An Adaptive Post-Processing Network With the Global-Local Aggregation for Semantic Segmentation

被引：6

作者：

Zhu, Guilin ^{[1
]}

Wang, Runmin ^{[1
]}

Liu, Yingying ^{[1
]}

Zhu, Zhenlin ^{[1
]}

Gao, Changxin ^{[2
]}

Liu, Li ^{[3
]}

Sang, Nong ^{[2
]}

机构：

[1] Hunan Normal Univ, Sch Informat Sci & Engn, Changsha 410081, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China

[3] Natl Univ Def Technol, Sch Syst Engn, Changsha 410000, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Context modeling; Semantic segmentation; Task analysis; Predictive models; Transformers; Modeling; Adaptation models; post-processing; global-local aggregation; pixel-aware attention; class-aware attention;

D O I：

10.1109/TCSVT.2023.3292156

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Current semantic segmentation methods mainly focus on modeling the context of the global image to obtain high-quality segmentation results. However, they ignore the role of local image patches, which contain complementary and effective context information. In this paper, we propose an adaptive post-processing network (APPNet) for semantic segmentation based on the predictions of current methods in the global image and local image patches. The key point of APPNet is the global-local aggregation module, which models the context between global predictions and local predictions to generate accurate pixel-wise representation. Furthermore, we develop an adaptive points replacement module to compensate for the lack of fine detail in global prediction and the overconfidence in local predictions. Our method can be readily integrated into existing segmentation methods (i.e., ConvNeXt, HRNet, ViT-Adapter) with little memory and without extra modification in current models. We empirically demonstrate our method brings performance improvements across diverse datasets (i.e., Cityscapes, ADE20K, PASCAL-Context, COCO-Stuff).

引用

页码：1159 / 1173

页数：15

共 71 条

[1] [Anonymous], 2016, P INT C LEARN REPR
[2] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Badrinarayanan, Vijay
Kendall, Alex
Cipolla, Roberto
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
[3] Bousselham W, 2022, Arxiv, DOI arXiv:2111.13280
[4] COCO-Stuff: Thing and Stuff Classes in Context
Caesar, Holger
Uijlings, Jasper
Ferrari, Vittorio
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1209 - 1218
[5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[6] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[7] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[8] Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images
Chen, Wuyang
Jiang, Ziyu
Wang, Zhangyang
Cui, Kexin
Qian, Xiaoning
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8916 - 8925
[9] Chen Z, 2023, Arxiv, DOI [arXiv:2205.08534, DOI 10.48550/ARXIV.2205.08534]
[10] Cheng B, 2021, ADV NEUR IN, V34

← 1 2 3 4 5 6 7 8 →