CGAN-NET: CLASS-GUIDED ASYMMETRIC NON-LOCAL NETWORK FOR REAL-TIME SEMANTIC SEGMENTATION

被引:1
作者
Chen, Hanlin [1 ]
Hu, Qingyong [2 ]
Yang, Jungang [1 ]
Wu, Jing [1 ]
Guo, Yulan [1 ,3 ]
机构
[1] Natl Univ Def Technol, Zunyi, Guizhou, Peoples R China
[2] Univ Oxford, Oxford, England
[3] Sun Yat Sen Univ, Guangzhou, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
D O I
10.1109/ICASSP39728.2021.9414957
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
By introducing various non-local blocks to capture the long-range dependencies, remarkable progress has been achieved in semantic segmentation recently. However, the improvement in segmentation accuracy usually comes at the price of significant reductions in network efficiency, as non-local block usually requires expensive computation and memory cost for dense pixel-to-pixel correlation. In this paper, we introduce a Class-Guided Asymmetric Non-local Network (CGAN-Net) to enhance the class-discriminability in learned feature map, while maintaining real-time efficiency. The key to our approach is to calculate the dense similarity matrix in coarse semantic prediction maps, instead of the high-dimensional latent feature map. This is not only computationally and memory efficient, but helps to learn query-dependent global context. Experiments conducted on Cityscape and CamVid demonstrate the compelling performance of our CGAN-Net. In particular, our network achieves 76.8% mean IoU on the Cityscapes test set with a speed of 38 FPS for 1024x2048 images on a single Tesla V100 GPU.
引用
收藏
页码:2325 / 2329
页数:5
相关论文
共 27 条
  • [1] [Anonymous], 2018, CGNET LIGHT WEIGHT C
  • [2] A survey of augmented reality
    Azuma, RT
    [J]. PRESENCE-VIRTUAL AND AUGMENTED REALITY, 1997, 6 (04): : 355 - 385
  • [3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [4] Semantic object classes in video: A high-definition ground truth database
    Brostow, Gabriel J.
    Fauqueur, Julien
    Cipolla, Roberto
    [J]. PATTERN RECOGNITION LETTERS, 2009, 30 (02) : 88 - 97
  • [5] GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
    Cao, Yue
    Xu, Jiarui
    Lin, Stephen
    Wei, Fangyun
    Hu, Han
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1971 - 1980
  • [6] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [7] Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
  • [8] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [9] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [10] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (09) : 1904 - 1916