MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

被引:0
|
作者
Zhang, Shuai [1 ]
Xie, Minghong [1 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
来源
FRONTIERS IN PHYSICS | 2024年 / 12卷
关键词
RGB-D semantic segmentation; attention mechanism; feature fusion; multi-modal interaction; feature enhancement; INFORMATION; FUSION;
D O I
10.3389/fphy.2024.1411559
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Intra-inter Modal Attention Blocks for RGB-D Semantic Segmentation
    Choi, Soyun
    Zhang, Youjia
    Hong, Sungeun
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 217 - 225
  • [2] CDMANet: central difference mutual attention network for RGB-D semantic segmentation
    Ge, Mengjiao
    Su, Wen
    Gao, Jinfeng
    Jia, Guoqiang
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01)
  • [3] RGB-D Dual Modal Information Complementary Semantic Segmentation Network
    Wang L.
    Gu N.
    Xin J.
    Wang S.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (10): : 1489 - 1499
  • [4] Attention-based fusion network for RGB-D semantic segmentation
    Zhong, Li
    Guo, Chi
    Zhan, Jiao
    Deng, JingYi
    NEUROCOMPUTING, 2024, 608
  • [5] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [6] Interactive Efficient Multi-Task Network for RGB-D Semantic Segmentation
    Xu, Xinhua
    Liu, Jinfu
    Liu, Hong
    ELECTRONICS, 2023, 12 (18)
  • [7] A Cross-Modal Feature Fusion Model Based on ConvNeXt for RGB-D Semantic Segmentation
    Tang, Xiaojiang
    Li, Baoxia
    Guo, Junwei
    Chen, Wenzhuo
    Zhang, Dan
    Huang, Feng
    MATHEMATICS, 2023, 11 (08)
  • [8] MMPL-Net: multi-modal prototype learning for one-shot RGB-D segmentation
    Shan, Dexing
    Zhang, Yunzhou
    Liu, Xiaozheng
    Liu, Shitong
    Coleman, Sonya A.
    Kerr, Dermot
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (14) : 10297 - 10310
  • [9] MMPL-Net: multi-modal prototype learning for one-shot RGB-D segmentation
    Dexing Shan
    Yunzhou Zhang
    Xiaozheng Liu
    Shitong Liu
    Sonya A. Coleman
    Dermot Kerr
    Neural Computing and Applications, 2023, 35 : 10297 - 10310
  • [10] MULTI-MODAL FEATURE FUSION FOR ACTION RECOGNITION IN RGB-D SEQUENCES
    Shahroudy, Amir
    Wang, Gang
    Ng, Tian-Tsong
    2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 73 - 76