MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

被引:0
|
作者
Zhang, Shuai [1 ]
Xie, Minghong [1 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
来源
FRONTIERS IN PHYSICS | 2024年 / 12卷
关键词
RGB-D semantic segmentation; attention mechanism; feature fusion; multi-modal interaction; feature enhancement; INFORMATION; FUSION;
D O I
10.3389/fphy.2024.1411559
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Shape-Aware Convolution with Convolutional Kernel Attention for RGB-D Image Semantic Segmentation
    Zhou, Kun
    Zhang, Zejun
    Tang, Xu
    Xu, Wen
    Xie, Jianxiao
    Tang, Changbing
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2025, E108A (02) : 140 - 148
  • [22] Automatic Network Architecture Search for RGB-D Semantic Segmentation
    Wang, Wenna
    Zhuo, Tao
    Zhang, Xiuwei
    Sun, Mingjun
    Yin, Hanlin
    Xing, Yinghui
    Zhang, Yanning
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3777 - 3786
  • [23] Evaluation of Multimodal Semantic Segmentation using RGB-D Data
    Hu, Jiesi
    Zhao, Ganning
    You, Suya
    Kuo, C. C. Jay
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS III, 2021, 11746
  • [24] RGB-D Salient Object Detection Method Based on Multi-Modal Fusion and Contour Guidance
    Peng, Yanbin
    Feng, Mingkun
    Zheng, Zhijun
    IEEE ACCESS, 2023, 11 : 145217 - 145230
  • [25] Non-Local Aggregation for RGB-D Semantic Segmentation
    Zhang, Guodong
    Xue, Jing-Hao
    Xie, Pengwei
    Yang, Sifan
    Wang, Guijin
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 658 - 662
  • [26] Salient Semantic Segmentation Based on RGB-D Camera for Robot Semantic Mapping
    Hu, Lihe
    Zhang, Yi
    Wang, Yang
    Yang, Huan
    Tan, Shuyi
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [27] Feature Enhancement and Multi-scale Cross-Modal Attention for RGB-D Salient Object Detection
    Wan, Xin
    Yang, Gang
    Zhou, Boyi
    Liu, Chang
    Wang, Hangxu
    Wang, Yutao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 409 - 420
  • [28] EFINet: Efficient Feature Interaction Network for Real-Time RGB-D Semantic Segmentation
    Yang, Zhe
    Mu, Baozhong
    Wang, Mingxun
    Wang, Xin
    Xu, Jie
    Yang, Baolu
    Yang, Cheng
    Li, Hong
    Lv, Rongqi
    IEEE ACCESS, 2024, 12 : 151046 - 151062
  • [29] Multi-foreground objects segmentation based on RGB-D image
    Li, Yan
    Zhu, Di
    Chen, Hui
    Nie, Jing
    Liu, Jiaju
    Tu, Changhe
    Li, Haikun
    COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2023, 23 (01) : 31 - 55
  • [30] MGCNet: Multilevel Gated Collaborative Network for RGB-D Semantic Segmentation of Indoor Scene
    Yang, Enquan
    Zhou, Wujie
    Qian, Xionghong
    Yu, Lu
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2567 - 2571