EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

被引：0

作者：

Wang, Junjue ^{[1
]}

Zheng, Zhuo ^{[2
]}

Chen, Zihang ^{[1
]}

Ma, Ailong ^{[1
]}

Zhong, Yanfei ^{[1
]}

机构：

[1] Wuhan Univ, LIESMARS, Wuhan 430074, Peoples R China

[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6 | 2024年

基金：

中国国家自然科学基金;

关键词：

IMAGERY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multimodal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded. As objects are the basis for complex relational reasoning, we propose a Semantic OBject Awareness framework (SOBA) to advance VQA in an object-centric way. To preserve refined spatial locations and semantics, SOBA leverages a segmentation network for object semantics generation. The object-guided attention aggregates object interior features via pseudo masks, and bidirectional cross-attention further models object external relations hierarchically. To optimize object counting, we propose a numerical difference loss that dynamically adds difference penalties, unifying the classification and regression tasks. Experimental results show that SOBA outperforms both advanced general and remote sensing methods. We believe this dataset and framework provide a strong benchmark for Earth vision's complex analysis. The project page is at https://Junjue-Wang.github.io/homepage/EarthVQA.

引用

页码：5481 / 5489

页数：9

共 50 条

[31] CAPTURING GLOBAL AND LOCAL INFORMATION IN REMOTE SENSING VISUAL QUESTION ANSWERING
Guo, Yan
Huang, Yuancheng
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 6340 - 6343
[32] RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
Wang, Yuduo
Ghamisi, Pedram
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[33] Improving visual question answering for remote sensing via alternate-guided attention and combined loss
Feng, Jiangfan
Tang, Etao
Zeng, Maimai
Gu, Zhujun
Kou, Pinglang
Zheng, Wei
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 122
[34] Explainable Knowledge reasoning via thought chains for knowledge-based visual question answering
Qiu, Chen
Xie, Zhiqiang
Liu, Maofu
Hu, Huijun
INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
[35] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering
Liu, Yang
Li, Guanbin
Lin, Liang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 11624 - 11641
[36] Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing
Siebert, Tim
Clasen, Kai Norman
Ravanbakhsh, Mahdyar
Demir, Beguem
IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXVIII, 2022, 12267
[37] RSMoDM: Multimodal Momentum Distillation Model for Remote Sensing Visual Question Answering
Li, Pengfei
Liu, Gang
He, Jinlong
Meng, Xiangxu
Zhong, Shenjun
Chen, Xun
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 16799 - 16814
[38] OPEN-ENDED VISUAL QUESTION ANSWERING MODEL FOR REMOTE SENSING IMAGES
Alsaleh, Sara O.
Bazi, Yakoub
Al Rahhal, Mohamad M.
Al Zuair, Mansour
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2848 - 2851
[39] Cascade Reasoning Network for Text-based Visual Question Answering
Liu, Fen
Xu, Guanghui
Wu, Qi
Du, Qing
Jia, Wei
Tan, Mingkui
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4060 - 4069
[40] Hierarchical reasoning based on perception action cycle for visual question answering
Mohamud, Safaa Abdullahi Moallim
Jalali, Amin
Lee, Minho
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241

← 1 2 3 4 5 →