MMSMCNet: Modal Memory Sharing and Morphological Complementary Networks for RGB-T Urban Scene Semantic Segmentation

被引:32
作者
Zhou, Wujie [1 ,2 ]
Zhang, Han [1 ]
Yan, Weiqing [3 ]
Lin, Weisi [2 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 308232, Singapore
[3] Yantai Univ, Sch Comp & Control Engn, Yantai 264005, Peoples R China
关键词
RGB-T semantic segmentation; information memory sharing; contour skeleton positioning; morphological complementary; complementary supervision strategy; FUSION;
D O I
10.1109/TCSVT.2023.3275314
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Combining color (RGB) images with thermal images can facilitate semantic segmentation of poorly lit urban scenes. However, for RGB-thermal (RGB-T) semantic segmentation, most existing models address cross-modal feature fusion by focusing only on exploring the samples while neglecting the connections between different samples. Additionally, although the importance of boundary, binary, and semantic information is considered in the decoding process, the differences and complementarities between different morphological features are usually neglected. In this paper, we propose a novel RGB-T semantic segmentation network, called MMSMCNet, based on modal memory fusion and morphological multiscale assistance to address the aforementioned problems. For this network, in the encoding part, we used SegFormer for feature extraction of bimodal inputs. Next, our modal memory sharing module implements staged learning and memory sharing of sample information across modal multiscales. Furthermore, we constructed a decoding union unit comprising three decoding units in a layer-by-layer progression that can extract two different morphological features according to the information category and realize the complementary utilization of multiscale cross-modal fusion information. Each unit contains a contour positioning module based on detail information, a skeleton positioning module with deep features as the primary input, and a morphological complementary module for mutual reinforcement of the first two types of information and construction of semantic information. Based on this, we constructed a new supervision strategy, that is, a multi-unit-based complementary supervision strategy. Extensive experiments using two standard datasets showed that MMSMCNet outperformed related state-of-the-art methods. The code is available at: https://github.com/2021nihao/MMSMCNet.
引用
收藏
页码:7096 / 7108
页数:13
相关论文
共 80 条
  • [1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [2] CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection
    Chen, Gang
    Shao, Feng
    Chai, Xiongli
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6308 - 6323
  • [3] Chen L. -C., 2018, P EUR C COMP VIS ECC, P801
  • [4] Chen LC, 2016, Arxiv, DOI [arXiv:1412.7062, 10.1080/17476938708814211]
  • [5] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [6] Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
  • [7] FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time Semantic Segmentation
    Deng, Fuqin
    Feng, Hua
    Liang, Mingjian
    Wang, Hongmin
    Yang, Yong
    Gao, Yuan
    Chen, Junfeng
    Hu, Junjie
    Guo, Xiyue
    Lam, Tin Lun
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 4467 - 4473
  • [8] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [9] DooDLeNet: Double DeepLab Enhanced Feature Fusion for Thermal-color Semantic Segmentation
    Frigo, Oriel
    Martin-Gaffe, Lucien
    Wacongne, Catherine
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3020 - 3028
  • [10] Dual Attention Network for Scene Segmentation
    Fu, Jun
    Liu, Jing
    Tian, Haijie
    Li, Yong
    Bao, Yongjun
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149