EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

被引:0
|
作者
Wang, Junjue [1 ]
Zheng, Zhuo [2 ]
Chen, Zihang [1 ]
Ma, Ailong [1 ]
Zhong, Yanfei [1 ]
机构
[1] Wuhan Univ, LIESMARS, Wuhan 430074, Peoples R China
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
基金
中国国家自然科学基金;
关键词
IMAGERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multimodal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded. As objects are the basis for complex relational reasoning, we propose a Semantic OBject Awareness framework (SOBA) to advance VQA in an object-centric way. To preserve refined spatial locations and semantics, SOBA leverages a segmentation network for object semantics generation. The object-guided attention aggregates object interior features via pseudo masks, and bidirectional cross-attention further models object external relations hierarchically. To optimize object counting, we propose a numerical difference loss that dynamically adds difference penalties, unifying the classification and regression tasks. Experimental results show that SOBA outperforms both advanced general and remote sensing methods. We believe this dataset and framework provide a strong benchmark for Earth vision's complex analysis. The project page is at https://Junjue-Wang.github.io/homepage/EarthVQA.
引用
收藏
页码:5481 / 5489
页数:9
相关论文
共 50 条
  • [21] HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering
    Liu, Fei
    Liu, Jing
    Wang, Weining
    Lu, Hanqing
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1678 - 1687
  • [22] Multi-scale Relational Reasoning with Regional Attention for Visual Question Answering
    Ma, Yuntao
    Lu, Tong
    Wu, Yirui
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5642 - 5649
  • [23] Embedding Spatial Relations in Visual Question Answering for Remote Sensing
    Faure, Maxime
    Lobry, Sylvain
    Kurtz, Camille
    Wendling, Laurent
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 310 - 316
  • [24] Co-LLaVA: Efficient Remote Sensing Visual Question Answering via Model Collaboration
    Liu, Fan
    Dai, Wenwen
    Zhang, Chuanyi
    Zhu, Jiale
    Yao, Liang
    Li, Xin
    REMOTE SENSING, 2025, 17 (03)
  • [25] Graph-based relational reasoning network for video question answering
    Tan, Tao
    Sun, Guanglu
    MACHINE VISION AND APPLICATIONS, 2025, 36 (01)
  • [26] Cross-Modal Visual Question Answering for Remote Sensing Data
    Felix, Rafael
    Repasky, Boris
    Hodge, Samuel
    Zolfaghari, Reza
    Abbasnejad, Ehsan
    Sherrah, Jamie
    2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021), 2021, : 57 - 65
  • [27] Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning
    Liu, Bo
    Zhan, Li-Ming
    Xu, Li
    Wu, Xiao-Ming
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (05) : 1532 - 1545
  • [28] Open-ended remote sensing visual question answering with transformers
    Al Rahhal, Mohamad M.
    Bazi, Yakoub
    Alsaleh, Sara O.
    Al-Razgan, Muna
    Mekhalfi, Mohamed Lamine
    Al Zuair, Mansour
    Alajlan, Naif
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2022, 43 (18) : 6809 - 6823
  • [29] Mutual Attention Inception Network for Remote Sensing Visual Question Answering
    Zheng, Xiangtao
    Wang, Binqiang
    Du, Xingqian
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [30] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296