EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

被引:0
|
作者
Wang, Junjue [1 ]
Zheng, Zhuo [2 ]
Chen, Zihang [1 ]
Ma, Ailong [1 ]
Zhong, Yanfei [1 ]
机构
[1] Wuhan Univ, LIESMARS, Wuhan 430074, Peoples R China
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
基金
中国国家自然科学基金;
关键词
IMAGERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multimodal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded. As objects are the basis for complex relational reasoning, we propose a Semantic OBject Awareness framework (SOBA) to advance VQA in an object-centric way. To preserve refined spatial locations and semantics, SOBA leverages a segmentation network for object semantics generation. The object-guided attention aggregates object interior features via pseudo masks, and bidirectional cross-attention further models object external relations hierarchically. To optimize object counting, we propose a numerical difference loss that dynamically adds difference penalties, unifying the classification and regression tasks. Experimental results show that SOBA outperforms both advanced general and remote sensing methods. We believe this dataset and framework provide a strong benchmark for Earth vision's complex analysis. The project page is at https://Junjue-Wang.github.io/homepage/EarthVQA.
引用
收藏
页码:5481 / 5489
页数:9
相关论文
共 50 条
  • [1] Research on Visual Question Answering Based on GAT Relational Reasoning
    Miao, Yalin
    Cheng, Wenfang
    He, Shuyun
    Jiang, Hui
    NEURAL PROCESSING LETTERS, 2022, 54 (02) : 1435 - 1448
  • [2] Research on Visual Question Answering Based on GAT Relational Reasoning
    Yalin Miao
    Wenfang Cheng
    Shuyun He
    Hui Jiang
    Neural Processing Letters, 2022, 54 : 1435 - 1448
  • [3] A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering
    Zhang, Zixiao
    Jiao, Licheng
    Li, Lingling
    Liu, Xu
    Chen, Puhua
    Liu, Fang
    Li, Yuxuan
    Guo, Zhicheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [4] Hierarchical Multimodality Graph Reasoning for Remote Sensing Visual Question Answering
    Zhang, Han
    Wang, Keming
    Zhang, Laixian
    Wang, Bingshu
    Li, Xuelong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [5] Visual question answering method based on relational reasoning and gating mechanism
    Wang X.
    Chen Q.-H.
    Sun Q.
    Jia Y.-B.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (01): : 36 - 46
  • [6] MUREL: Multimodal Relational Reasoning for Visual Question Answering
    Cadene, Remi
    Ben-younes, Hedi
    Cord, Matthieu
    Thome, Nicolas
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
  • [7] Relational reasoning and adaptive fusion for visual question answering
    Shen, Xiang
    Han, Dezhi
    Zong, Liang
    Guo, Zihan
    Hua, Jie
    APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
  • [8] Visual Question Answering on CLEVR Dataset via Multimodal Fusion and Relational Reasoning
    Allahyari, Abbas
    Borna, Keivan
    2021 52ND ANNUAL IRANIAN MATHEMATICS CONFERENCE (AIMC), 2021, : 74 - 76
  • [9] An effective spatial relational reasoning networks for visual question answering
    Shen, Xiang
    Han, Dezhi
    Chen, Chongqing
    Luo, Gaofeng
    Wu, Zhongdai
    PLOS ONE, 2022, 17 (11):
  • [10] A Semantic Parsing and Reasoning-Based Approach to Knowledge Base Question Answering
    Abdelaziz, Ibrahim
    Ravishankar, Srinivas
    Kapanipathi, Pavan
    Roukos, Salim
    Gray, Alexander
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15985 - 15987