CGAN-NET: CLASS-GUIDED ASYMMETRIC NON-LOCAL NETWORK FOR REAL-TIME SEMANTIC SEGMENTATION

被引：1

作者：

Chen, Hanlin ^{[1
]}

Hu, Qingyong ^{[2
]}

Yang, Jungang ^{[1
]}

Wu, Jing ^{[1
]}

Guo, Yulan ^{[1
,3
]}

机构：

[1] Natl Univ Def Technol, Zunyi, Guizhou, Peoples R China

[2] Univ Oxford, Oxford, England

[3] Sun Yat Sen Univ, Guangzhou, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

D O I：

10.1109/ICASSP39728.2021.9414957

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

By introducing various non-local blocks to capture the long-range dependencies, remarkable progress has been achieved in semantic segmentation recently. However, the improvement in segmentation accuracy usually comes at the price of significant reductions in network efficiency, as non-local block usually requires expensive computation and memory cost for dense pixel-to-pixel correlation. In this paper, we introduce a Class-Guided Asymmetric Non-local Network (CGAN-Net) to enhance the class-discriminability in learned feature map, while maintaining real-time efficiency. The key to our approach is to calculate the dense similarity matrix in coarse semantic prediction maps, instead of the high-dimensional latent feature map. This is not only computationally and memory efficient, but helps to learn query-dependent global context. Experiments conducted on Cityscape and CamVid demonstrate the compelling performance of our CGAN-Net. In particular, our network achieves 76.8% mean IoU on the Cityscapes test set with a speed of 38 FPS for 1024x2048 images on a single Tesla V100 GPU.

引用

页码：2325 / 2329

页数：5

共 27 条

[1] [Anonymous], 2018, CGNET LIGHT WEIGHT C
[2] A survey of augmented reality
Azuma, RT
[J]. PRESENCE-VIRTUAL AND AUGMENTED REALITY, 1997, 6 (04): : 355 - 385
[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Badrinarayanan, Vijay
Kendall, Alex
Cipolla, Roberto
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
[4] Semantic object classes in video: A high-definition ground truth database
Brostow, Gabriel J.
Fauqueur, Julien
Cipolla, Roberto
[J]. PATTERN RECOGNITION LETTERS, 2009, 30 (02) : 88 - 97
[5] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
Cao, Yue
Xu, Jiarui
Lin, Stephen
Wei, Fangyun
Hu, Han
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1971 - 1980
[6] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[7] Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[8] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[9] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[10] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (09) : 1904 - 1916

← 1 2 3 →