MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

被引:0
|
作者
Zhang, Shuai [1 ]
Xie, Minghong [1 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
来源
FRONTIERS IN PHYSICS | 2024年 / 12卷
关键词
RGB-D semantic segmentation; attention mechanism; feature fusion; multi-modal interaction; feature enhancement; INFORMATION; FUSION;
D O I
10.3389/fphy.2024.1411559
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Robust 3D Semantic Segmentation Method Based on Multi-Modal Collaborative Learning
    Ni, Peizhou
    Li, Xu
    Xu, Wang
    Zhou, Xiaojing
    Jiang, Tao
    Hu, Weiming
    REMOTE SENSING, 2024, 16 (03)
  • [42] Multi-Modal Attention Network Learning for Semantic Source Code Retrieval
    Wan, Yao
    Shu, Jingdong
    Sui, Yulei
    Xu, Guandong
    Zhao, Zhou
    Wu, Jian
    Yu, Philip S.
    34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, : 13 - 25
  • [43] An improved YOLOv7 network using RGB-D multi-modal feature fusion for tea shoots detection
    Wu, Yanxu
    Chen, Jianneng
    Wu, Shunkai
    Li, Hui
    He, Leiying
    Zhao, Runmao
    Wu, Chuanyu
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 216
  • [44] EISNet: A Multi-Modal Fusion Network for Semantic Segmentation With Events and Images
    Xie, Bochen
    Deng, Yongjian
    Shao, Zhanpeng
    Li, Youfu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8639 - 8650
  • [45] Fusion based on attention mechanism and context constraint for multi-modal brain tumor segmentation
    Zhou, Tongxue
    Canu, Stephane
    Ruan, Su
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2020, 86
  • [46] Cross-Modal Adaptive Interaction Network for RGB-D Saliency Detection
    Du, Qinsheng
    Bian, Yingxu
    Wu, Jianyu
    Zhang, Shiyan
    Zhao, Jian
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [47] HDBFormer: Efficient RGB-D Semantic Segmentation With a Heterogeneous Dual-Branch Framework
    Wei, Shuobin
    Zhou, Zhuang
    Lu, Zhengan
    Yuan, Zizhao
    Su, Binghua
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 91 - 95
  • [48] COUPLING TWO-STREAM RGB-D SEMANTIC SEGMENTATION NETWORK BY IDEMPOTENT MAPPINGS
    Xing, Yajie
    Wang, Jingbo
    Chen, Xiaokang
    Zeng, Gang
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1850 - 1854
  • [49] FGMNet: Feature grouping mechanism network for RGB-D indoor scene semantic segmentation
    Zhang, Yuming
    Zhou, Wujie
    Ye, Lv
    Yu, Lu
    Luo, Ting
    DIGITAL SIGNAL PROCESSING, 2024, 149
  • [50] RGB-D Domain adaptive semantic segmentation with cross-modality feature recalibration
    Fan, Qizhe
    Shen, Xiaoqin
    Ying, Shihui
    Wang, Juan
    Du, Shaoyi
    INFORMATION FUSION, 2025, 120