MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention

被引：0

作者：

Zhang, Shuai ^{[1
]}

Xie, Minghong ^{[1
]}

机构：

[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China

来源：

FRONTIERS IN PHYSICS | 2024年 / 12卷

关键词：

RGB-D semantic segmentation; attention mechanism; feature fusion; multi-modal interaction; feature enhancement; INFORMATION; FUSION;

D O I：

10.3389/fphy.2024.1411559

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

The semantic segmentation of RGB-D images involves understanding objects appearances and spatial relationships within a scene, which necessitates careful consideration of multiple factors. In indoor scenes, the presence of diverse and disorderly objects, coupled with illumination variations and the influence of adjacent objects, can easily result in misclassifications of pixels, consequently affecting the outcome of semantic segmentation. We propose a Multi-modal Interaction and Pooling Attention Network (MIPANet) in response to these challenges. This network is designed to exploit the interactive synergy between RGB and depth modalities, aiming to enhance the utilization of complementary information and improve segmentation accuracy. Specifically, we incorporate a Multi-modal Interaction Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Moreover, we introduce a Pooling Attention Module (PAM) at various stages of the encoder to enhance the features extracted by the network. The outputs of the PAMs at different stages are selectively integrated into the decoder through a refinement module to improve semantic segmentation performance. Experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYU-Depth V2 and SUN-RGBD, by optimizing the insufficient information interaction between different modalities in RGB-D semantic segmentation. The source codes are available at https://github.com/2295104718/MIPANet.

引用

页数：13

共 50 条

[21] Shape-Aware Convolution with Convolutional Kernel Attention for RGB-D Image Semantic Segmentation
Zhou, Kun
Zhang, Zejun
Tang, Xu
Xu, Wen
Xie, Jianxiao
Tang, Changbing
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2025, E108A (02) : 140 - 148
[22] Automatic Network Architecture Search for RGB-D Semantic Segmentation
Wang, Wenna
Zhuo, Tao
Zhang, Xiuwei
Sun, Mingjun
Yin, Hanlin
Xing, Yinghui
Zhang, Yanning
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3777 - 3786
[23] Evaluation of Multimodal Semantic Segmentation using RGB-D Data
Hu, Jiesi
Zhao, Ganning
You, Suya
Kuo, C. C. Jay
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS III, 2021, 11746
[24] RGB-D Salient Object Detection Method Based on Multi-Modal Fusion and Contour Guidance
Peng, Yanbin
Feng, Mingkun
Zheng, Zhijun
IEEE ACCESS, 2023, 11 : 145217 - 145230
[25] Non-Local Aggregation for RGB-D Semantic Segmentation
Zhang, Guodong
Xue, Jing-Hao
Xie, Pengwei
Yang, Sifan
Wang, Guijin
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 658 - 662
[26] Salient Semantic Segmentation Based on RGB-D Camera for Robot Semantic Mapping
Hu, Lihe
Zhang, Yi
Wang, Yang
Yang, Huan
Tan, Shuyi
APPLIED SCIENCES-BASEL, 2023, 13 (06):
[27] Feature Enhancement and Multi-scale Cross-Modal Attention for RGB-D Salient Object Detection
Wan, Xin
Yang, Gang
Zhou, Boyi
Liu, Chang
Wang, Hangxu
Wang, Yutao
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 409 - 420
[28] EFINet: Efficient Feature Interaction Network for Real-Time RGB-D Semantic Segmentation
Yang, Zhe
Mu, Baozhong
Wang, Mingxun
Wang, Xin
Xu, Jie
Yang, Baolu
Yang, Cheng
Li, Hong
Lv, Rongqi
IEEE ACCESS, 2024, 12 : 151046 - 151062
[29] Multi-foreground objects segmentation based on RGB-D image
Li, Yan
Zhu, Di
Chen, Hui
Nie, Jing
Liu, Jiaju
Tu, Changhe
Li, Haikun
COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2023, 23 (01) : 31 - 55
[30] MGCNet: Multilevel Gated Collaborative Network for RGB-D Semantic Segmentation of Indoor Scene
Yang, Enquan
Zhou, Wujie
Qian, Xionghong
Yu, Lu
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2567 - 2571

← 1 2 3 4 5 →