RRSIS: Referring Remote Sensing Image Segmentation

被引:20
作者
Yuan, Zhenghang [1 ]
Mou, Lichao [1 ]
Hua, Yuansheng [2 ]
Zhu, Xiao Xiang [1 ,3 ]
机构
[1] Tech Univ Munich, Chair Data Sci Earth Observat, D-80333 Munich, Germany
[2] Shenzhen Univ, Coll Civil & Transportat Engn, Shenzhen 518060, Peoples R China
[3] Munich Ctr Machine Learning, D-80333 Munich, Germany
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷
关键词
Deep learning; natural language; referring image segmentation; remote sensing;
D O I
10.1109/TGRS.2024.3369720
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Localizing desired objects from remote sensing images is of great use in practical applications. Referring image segmentation, which aims at segmenting out the objects to which a given expression refers, has been extensively studied in natural images. However, almost no research attention is given to this task of remote sensing imagery. Considering its potential for real-world applications, in this article, we introduce referring remote sensing image segmentation (RRSIS) to fill in this gap and make some insightful explorations. Specifically, we created a new dataset, called RefSegRS, for this task, enabling us to evaluate different methods. Afterward, we benchmark referring image segmentation methods of natural images on the RefSegRS dataset and find that these models show limited efficacy in detecting small and scattered objects. To alleviate this issue, we propose a language-guided cross-scale enhancement (LGCE) module that utilizes linguistic features to adaptively enhance multiscale visual features by integrating both deep and shallow features. The proposed dataset, benchmarking results, and the designed LGCE module provide insights into the design of a better RRSIS model. The dataset and code will be available at https://gitlab.lrz.de/ai4eo/reasoning/rrsis.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 15 条
[1]   Bi-Modal Transformer-Based Approach for Visual Question Answering in Remote Sensing Imagery [J].
Bazi, Yakoub ;
Al Rahhal, Mohamad Mahmoud ;
Mekhalfi, Mohamed Lamine ;
Al Zuair, Mansour Abdulaziz ;
Melgani, Farid .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[2]   Prompt-RSVQA: Prompting visual context to a language model for Remote Sensing Visual Question Answering [J].
Chappuis, Christel ;
Zermatten, Valerie ;
Lobry, Sylvain ;
Le Saux, Bertrand ;
Tuia, Devis .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :1371-1380
[3]  
Kandala H., IEEEN
[4]   Truncation Cross Entropy Loss for Remote Sensing Image Captioning [J].
Li, Xuelong ;
Zhang, Xueting ;
Huang, Wei ;
Wang, Qi .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (06) :5246-5257
[5]   Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning [J].
Li, Yunpeng ;
Zhang, Xiangrong ;
Gu, Jing ;
Li, Chen ;
Wang, Xin ;
Tang, Xu ;
Jiao, Licheng .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[6]   RSVQA MEETS BIGEARTHNET: A NEW, LARGE-SCALE, VISUAL QUESTION ANSWERING DATASET FOR REMOTE SENSING [J].
Lobry, Sylvain ;
Demir, Begiim ;
Tuia, Devis .
2021 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM IGARSS, 2021, :1218-1221
[7]   Exploring Models and Data for Remote Sensing Image Caption Generation [J].
Lu, Xiaoqiang ;
Wang, Binqiang ;
Zheng, Xiangtao ;
Li, Xuelong .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (04) :2183-2195
[8]  
Qu B., 2016, INT C COMPUT INFTELE, P1
[9]  
Sumbul G, 2019, INT GEOSCI REMOTE SE, P5901, DOI [10.1109/IGARSS.2019.8900532, 10.1109/igarss.2019.8900532]
[10]   A modified uniformly distributed heat source method for predicting braking temperature of railway brake disc [J].
Yuan, Zewang ;
Tian, Chun ;
Wu, Mengling ;
Wang, Guozhuang .
INTERNATIONAL JOURNAL OF RAIL TRANSPORTATION, 2022, 10 (02) :216-229