EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

被引:0
|
作者
Wang, Junjue [1 ]
Zheng, Zhuo [2 ]
Chen, Zihang [1 ]
Ma, Ailong [1 ]
Zhong, Yanfei [1 ]
机构
[1] Wuhan Univ, LIESMARS, Wuhan 430074, Peoples R China
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
基金
中国国家自然科学基金;
关键词
IMAGERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multimodal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded. As objects are the basis for complex relational reasoning, we propose a Semantic OBject Awareness framework (SOBA) to advance VQA in an object-centric way. To preserve refined spatial locations and semantics, SOBA leverages a segmentation network for object semantics generation. The object-guided attention aggregates object interior features via pseudo masks, and bidirectional cross-attention further models object external relations hierarchically. To optimize object counting, we propose a numerical difference loss that dynamically adds difference penalties, unifying the classification and regression tasks. Experimental results show that SOBA outperforms both advanced general and remote sensing methods. We believe this dataset and framework provide a strong benchmark for Earth vision's complex analysis. The project page is at https://Junjue-Wang.github.io/homepage/EarthVQA.
引用
收藏
页码:5481 / 5489
页数:9
相关论文
共 50 条
  • [31] CAPTURING GLOBAL AND LOCAL INFORMATION IN REMOTE SENSING VISUAL QUESTION ANSWERING
    Guo, Yan
    Huang, Yuancheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 6340 - 6343
  • [32] RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
    Wang, Yuduo
    Ghamisi, Pedram
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [33] Improving visual question answering for remote sensing via alternate-guided attention and combined loss
    Feng, Jiangfan
    Tang, Etao
    Zeng, Maimai
    Gu, Zhujun
    Kou, Pinglang
    Zheng, Wei
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 122
  • [34] Explainable Knowledge reasoning via thought chains for knowledge-based visual question answering
    Qiu, Chen
    Xie, Zhiqiang
    Liu, Maofu
    Hu, Huijun
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
  • [35] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
    Liu, Yang
    Li, Guanbin
    Lin, Liang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 11624 - 11641
  • [36] Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing
    Siebert, Tim
    Clasen, Kai Norman
    Ravanbakhsh, Mahdyar
    Demir, Beguem
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXVIII, 2022, 12267
  • [37] RSMoDM: Multimodal Momentum Distillation Model for Remote Sensing Visual Question Answering
    Li, Pengfei
    Liu, Gang
    He, Jinlong
    Meng, Xiangxu
    Zhong, Shenjun
    Chen, Xun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 16799 - 16814
  • [38] OPEN-ENDED VISUAL QUESTION ANSWERING MODEL FOR REMOTE SENSING IMAGES
    Alsaleh, Sara O.
    Bazi, Yakoub
    Al Rahhal, Mohamad M.
    Al Zuair, Mansour
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2848 - 2851
  • [39] Cascade Reasoning Network for Text-based Visual Question Answering
    Liu, Fen
    Xu, Guanghui
    Wu, Qi
    Du, Qing
    Jia, Wei
    Tan, Mingkui
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4060 - 4069
  • [40] Hierarchical reasoning based on perception action cycle for visual question answering
    Mohamud, Safaa Abdullahi Moallim
    Jalali, Amin
    Lee, Minho
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241