REFERRING IMAGE SEGMENTATION FOR REMOTE SENSING DATA

被引：0

作者：

Yuan, Zhenghang ^{[1
]}

Mou, Lichao ^{[1
]}

Hua, Yuansheng ^{[2
]}

Zhu, Xiao Xiang ^{[1
]}

机构：

[1] Tech Univ Munich TUM, Data Sci Earth Observat, Munich, Germany

[2] Shenzhen Univ, Coll Civil & Transportat Engn, Shenzhen, Peoples R China

来源：

IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024 | 2024年

关键词：

Referring image segmentation; remote sensing; vision-language task;

D O I：

10.1109/IGARSS53475.2024.10642726

中图分类号：

学科分类号：

摘要：

In this paper, we present a new task: referring image segmentation for remote sensing data, which targets segmenting out specific objects referred to by natural language. Due to the absence of a dataset for this task, we construct a dataset based on the SkyScapes dataset. Our dataset is designed with linguistically structured expressions that focus on object categories, attributes, and spatial relationships, enabling the generation of binary masks from semantic segmentation maps. To benchmark this task, we evaluate and compare the performance of three different convolutional neural network (CNN)-based methods and a Transformer-based method. Experimental results provide valuable insights into the adaptability of these methods to remote sensing data, highlighting the potential of our dataset as a resource for the remote sensing community to further explore vision-language tasks.

引用

页码：946 / 949

页数：4

共 12 条

[1] SkyScapes - Fine-Grained Semantic Understanding of Aerial Scenes
Azimi, Seyed Majid
Henry, Corentin
Sommer, Lars
Schumann, Arne
Vig, Eleonora
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7392 - 7402
[2] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3] Segmentation from Natural Language Expressions
Hu, Ronghang
Rohrbach, Marcus
Darrell, Trevor
[J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 108 - 124
[4] Bi-directional Relationship Inferring Network for Referring Image Segmentation
Hu, Zhiwei
Feng, Guang
Sun, Jiayu
Zhang, Lihe
Lu, Huchuan
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4423 - 4432
[5] Loshchilov I., 2018, INT C LEARN REPR, DOI DOI 10.48550/ARXIV.1711.05101
[6] Sumbul G., 2020, IEEE T GEOSCIENCE RE
[7] Xiong Zhitong, 2022, ARXIV
[8] Xiong Zhitong, 2024, ARXIV
[9] LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Yang, Zhao
Wang, Jiaqi
Tang, Yansong
Chen, Kai
Zhao, Hengshuang
Torr, Philip H. S.
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18134 - 18144
[10] Cross-Modal Self-Attention Network for Referring Image Segmentation
Ye, Linwei
Rochan, Mrigank
Liu, Zhi
Wang, Yang
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10494 - 10503

← 1 2 →