MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

被引：0

作者：

Zhang, Shuai ^{[1
]}

Xie, Minghong ^{[1
]}

机构：

[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China

来源：

FRONTIERS IN PHYSICS | 2024年 / 12卷

关键词：

RGB-D semantic segmentation; attention mechanism; feature fusion; multi-modal interaction; feature enhancement; INFORMATION; FUSION;

D O I：

10.3389/fphy.2024.1411559

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.

引用

页数：13

共 50 条

[41] Robust 3D Semantic Segmentation Method Based on Multi-Modal Collaborative Learning
Ni, Peizhou
Li, Xu
Xu, Wang
Zhou, Xiaojing
Jiang, Tao
Hu, Weiming
REMOTE SENSING, 2024, 16 (03)
[42] Multi-Modal Attention Network Learning for Semantic Source Code Retrieval
Wan, Yao
Shu, Jingdong
Sui, Yulei
Xu, Guandong
Zhao, Zhou
Wu, Jian
Yu, Philip S.
34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, : 13 - 25
[43] An improved YOLOv7 network using RGB-D multi-modal feature fusion for tea shoots detection
Wu, Yanxu
Chen, Jianneng
Wu, Shunkai
Li, Hui
He, Leiying
Zhao, Runmao
Wu, Chuanyu
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 216
[44] EISNet: A Multi-Modal Fusion Network for Semantic Segmentation With Events and Images
Xie, Bochen
Deng, Yongjian
Shao, Zhanpeng
Li, Youfu
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8639 - 8650
[45] Fusion based on attention mechanism and context constraint for multi-modal brain tumor segmentation
Zhou, Tongxue
Canu, Stephane
Ruan, Su
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2020, 86
[46] Cross-Modal Adaptive Interaction Network for RGB-D Saliency Detection
Du, Qinsheng
Bian, Yingxu
Wu, Jianyu
Zhang, Shiyan
Zhao, Jian
APPLIED SCIENCES-BASEL, 2024, 14 (17):
[47] HDBFormer: Efficient RGB-D Semantic Segmentation With a Heterogeneous Dual-Branch Framework
Wei, Shuobin
Zhou, Zhuang
Lu, Zhengan
Yuan, Zizhao
Su, Binghua
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 91 - 95
[48] COUPLING TWO-STREAM RGB-D SEMANTIC SEGMENTATION NETWORK BY IDEMPOTENT MAPPINGS
Xing, Yajie
Wang, Jingbo
Chen, Xiaokang
Zeng, Gang
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1850 - 1854
[49] FGMNet: Feature grouping mechanism network for RGB-D indoor scene semantic segmentation
Zhang, Yuming
Zhou, Wujie
Ye, Lv
Yu, Lu
Luo, Ting
DIGITAL SIGNAL PROCESSING, 2024, 149
[50] RGB-D Domain adaptive semantic segmentation with cross-modality feature recalibration
Fan, Qizhe
Shen, Xiaoqin
Ying, Shihui
Wang, Juan
Du, Shaoyi
INFORMATION FUSION, 2025, 120

← 1 2 3 4 5 →