Multimodal-Aware Fusion Network for Referring Remote Sensing Image Segmentation

被引:0
作者
Shi, Leideng [1 ]
Zhang, Juan [1 ]
机构
[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai 201600, Peoples R China
关键词
Visualization; Image segmentation; Remote sensing; Convolution; Transformers; Feature extraction; Noise; Correlation; Linguistics; Accuracy; Multimodal feature fusion; referring image segmentation; remote sensing images; semantic segmentation; Swin Transformer;
D O I
10.1109/LGRS.2025.3527485
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Referring remote sensing image segmentation (RRSIS) is a novel visual task in remote sensing images segmentation, which aims to segment objects based on a given text description, with great significance in practical application. Previous studies fuse visual and linguistic modalities by explicit feature interaction, which fail to effectively excavate useful multimodal information from dual-branch encoder. In this letter, we design a multimodal-aware fusion network (MAFN) to achieve fine-grained alignment and fusion between the two modalities. We propose a correlation fusion module (CFM) to enhance multiscale visual features by introducing adaptive noise in transformer, and integrate cross-modal aware features. In addition, MAFN employs multiscale refinement convolution (MSRC) to adapt to the various orientations of objects at different scales to boost their representation ability to enhances segmentation accuracy. Extensive experiments have shown that MAFN is significantly more effective than the state of the art (SOTA) on RRSIS-D datasets. The source code is available at https://github.com/Roaxy/MAFN.
引用
收藏
页数:5
相关论文
共 11 条
[1]   TransVG: End-to-End Visual Grounding with Transformers [J].
Deng, Jiajun ;
Yang, Zhengyuan ;
Chen, Tianlang ;
Zhou, Wengang ;
Li, Houqiang .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :1749-1759
[2]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[3]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[4]   LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation [J].
Hui, Tak-Wai ;
Tang, Xiaoou ;
Loy, Chen Change .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8981-8989
[5]  
Jia D., 2024, arXiv
[6]   Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [J].
Liu, Sihan ;
Ma, Yiwei ;
Zhang, Xiaoqing ;
Wang, Haowei ;
Ji, Jiayi ;
Sun, Xiaoshuai ;
Ji, Rongrong .
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :26648-26658
[7]   Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].
Liu, Ze ;
Lin, Yutong ;
Cao, Yue ;
Hu, Han ;
Wei, Yixuan ;
Zhang, Zheng ;
Lin, Stephen ;
Guo, Baining .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002
[8]  
Pu YF, 2023, IEEE I CONF COMP VIS, P6566, DOI 10.1109/ICCV51070.2023.00606
[9]   LAVT: Language-Aware Vision Transformer for Referring Image Segmentation [J].
Yang, Zhao ;
Wang, Jiaqi ;
Tang, Yansong ;
Chen, Kai ;
Zhao, Hengshuang ;
Torr, Philip H. S. .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :18134-18144
[10]   RRSIS: Referring Remote Sensing Image Segmentation [J].
Yuan, Zhenghang ;
Mou, Lichao ;
Hua, Yuansheng ;
Zhu, Xiao Xiang .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 :1-12