REFERRING IMAGE SEGMENTATION FOR REMOTE SENSING DATA

被引:0
作者
Yuan, Zhenghang [1 ]
Mou, Lichao [1 ]
Hua, Yuansheng [2 ]
Zhu, Xiao Xiang [1 ]
机构
[1] Tech Univ Munich TUM, Data Sci Earth Observat, Munich, Germany
[2] Shenzhen Univ, Coll Civil & Transportat Engn, Shenzhen, Peoples R China
来源
IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024 | 2024年
关键词
Referring image segmentation; remote sensing; vision-language task;
D O I
10.1109/IGARSS53475.2024.10642726
中图分类号
学科分类号
摘要
In this paper, we present a new task: referring image segmentation for remote sensing data, which targets segmenting out specific objects referred to by natural language. Due to the absence of a dataset for this task, we construct a dataset based on the SkyScapes dataset. Our dataset is designed with linguistically structured expressions that focus on object categories, attributes, and spatial relationships, enabling the generation of binary masks from semantic segmentation maps. To benchmark this task, we evaluate and compare the performance of three different convolutional neural network (CNN)-based methods and a Transformer-based method. Experimental results provide valuable insights into the adaptability of these methods to remote sensing data, highlighting the potential of our dataset as a resource for the remote sensing community to further explore vision-language tasks.
引用
收藏
页码:946 / 949
页数:4
相关论文
共 12 条
  • [1] SkyScapes - Fine-Grained Semantic Understanding of Aerial Scenes
    Azimi, Seyed Majid
    Henry, Corentin
    Sommer, Lars
    Schumann, Arne
    Vig, Eleonora
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7392 - 7402
  • [2] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [3] Segmentation from Natural Language Expressions
    Hu, Ronghang
    Rohrbach, Marcus
    Darrell, Trevor
    [J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 108 - 124
  • [4] Bi-directional Relationship Inferring Network for Referring Image Segmentation
    Hu, Zhiwei
    Feng, Guang
    Sun, Jiayu
    Zhang, Lihe
    Lu, Huchuan
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4423 - 4432
  • [5] Loshchilov I., 2018, INT C LEARN REPR, DOI DOI 10.48550/ARXIV.1711.05101
  • [6] Sumbul G., 2020, IEEE T GEOSCIENCE RE
  • [7] Xiong Zhitong, 2022, ARXIV
  • [8] Xiong Zhitong, 2024, ARXIV
  • [9] LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
    Yang, Zhao
    Wang, Jiaqi
    Tang, Yansong
    Chen, Kai
    Zhao, Hengshuang
    Torr, Philip H. S.
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18134 - 18144
  • [10] Cross-Modal Self-Attention Network for Referring Image Segmentation
    Ye, Linwei
    Rochan, Mrigank
    Liu, Zhi
    Wang, Yang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10494 - 10503