MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

被引：0

作者：

Zhang, Shuai ^{[1
]}

Xie, Minghong ^{[1
]}

机构：

[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China

来源：

FRONTIERS IN PHYSICS | 2024年 / 12卷

关键词：

RGB-D semantic segmentation; attention mechanism; feature fusion; multi-modal interaction; feature enhancement; INFORMATION; FUSION;

D O I：

10.3389/fphy.2024.1411559

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.

引用

页数：13

共 50 条

[1] Intra-inter Modal Attention Blocks for RGB-D Semantic Segmentation
Choi, Soyun
Zhang, Youjia
Hong, Sungeun
PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 217 - 225
[2] CDMANet: central difference mutual attention network for RGB-D semantic segmentation
Ge, Mengjiao
Su, Wen
Gao, Jinfeng
Jia, Guoqiang
JOURNAL OF SUPERCOMPUTING, 2025, 81 (01)
[3] RGB-D Dual Modal Information Complementary Semantic Segmentation Network
Wang L.
Gu N.
Xin J.
Wang S.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (10): : 1489 - 1499
[4] Attention-based fusion network for RGB-D semantic segmentation
Zhong, Li
Guo, Chi
Zhan, Jiao
Deng, JingYi
NEUROCOMPUTING, 2024, 608
[5] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
Song, Peipei
Zhang, Jing
Koniusz, Piotr
Barnes, Nick
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
[6] Interactive Efficient Multi-Task Network for RGB-D Semantic Segmentation
Xu, Xinhua
Liu, Jinfu
Liu, Hong
ELECTRONICS, 2023, 12 (18)
[7] A Cross-Modal Feature Fusion Model Based on ConvNeXt for RGB-D Semantic Segmentation
Tang, Xiaojiang
Li, Baoxia
Guo, Junwei
Chen, Wenzhuo
Zhang, Dan
Huang, Feng
MATHEMATICS, 2023, 11 (08)
[8] MMPL-Net: multi-modal prototype learning for one-shot RGB-D segmentation
Shan, Dexing
Zhang, Yunzhou
Liu, Xiaozheng
Liu, Shitong
Coleman, Sonya A.
Kerr, Dermot
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (14) : 10297 - 10310
[9] MMPL-Net: multi-modal prototype learning for one-shot RGB-D segmentation
Dexing Shan
Yunzhou Zhang
Xiaozheng Liu
Shitong Liu
Sonya A. Coleman
Dermot Kerr
Neural Computing and Applications, 2023, 35 : 10297 - 10310
[10] MULTI-MODAL FEATURE FUSION FOR ACTION RECOGNITION IN RGB-D SEQUENCES
Shahroudy, Amir
Wang, Gang
Ng, Tian-Tsong
2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 73 - 76

← 1 2 3 4 5 →