Cross-Modal Visual Question Answering for Remote Sensing Data

被引：1

作者：

Felix, Rafael ^{[1
]}

Repasky, Boris ^{[1
,2
]}

Hodge, Samuel ^{[1
]}

Zolfaghari, Reza ^{[3
]}

Abbasnejad, Ehsan ^{[2
]}

Sherrah, Jamie ^{[2
]}

机构：

[1] Australian Inst Machine Learning, Adelaide, SA, Australia

[2] Lockheed Martin Australia STELaRLab, Mawson Lakes, Australia

[3] Def Sci & Technol Grp, Canberra, ACT, Australia

来源：

2021 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2021) | 2021年

关键词：

Visual Question Answering; Deep learning; Natural Language Processing; Convolution Neural Networks; Recurrent Neural Networks; OpenStreetMap; CLASSIFICATION;

D O I：

10.1109/DICTA52665.2021.9647287

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While querying of structured geo-spatial data such as Google Maps has become commonplace, there remains a wealth of unstructured information in overhead imagery that is largely inaccessible to users. This information can be made accessible using machine learning for Visual Question Answering (VQA) about remote sensing imagery. We propose a novel method for Earth observation based on answering natural language questions about satellite images that uses cross-modal attention between image objects and text. The image is encoded with an object-centric feature space, with self-attention between objects, and the question is encoded with a language transformer network. The image and question representations are fed to a crossmodal transformer network that uses cross-attention between the image and text modalities to generate the answer. Our method is applied to the RSVQA remote sensing dataset and achieves a significant accuracy increase over the previous benchmark.

引用

页码：57 / 65

页数：9

共 50 条

[21] VISUAL QUESTION ANSWERING IN REMOTE SENSING WITH CROSS-ATTENTION AND MULTIMODAL INFORMATION BOTTLENECK
Songara, Jayesh
Pande, Shivam
Choudhury, Shabnam
Banerjee, Biplab
Velmurugan, Rajbabu
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6278 - 6281
[22] Bi-Modal Transformer-Based Approach for Visual Question Answering in Remote Sensing Imagery
Bazi, Yakoub
Al Rahhal, Mohamad Mahmoud
Mekhalfi, Mohamed Lamine
Al Zuair, Mansour Abdulaziz
Melgani, Farid
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[23] Deep Cross-Modal ImageVoice Retrieval in Remote Sensing
Chen, Yaxiong
Lu, Xiaoqiang
Wang, Shuai
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (10): : 7049 - 7061
[24] Embedding Spatial Relations in Visual Question Answering for Remote Sensing
Faure, Maxime
Lobry, Sylvain
Kurtz, Camille
Wendling, Laurent
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 310 - 316
[25] Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
Zhu, Zihao
Yu, Jing
Wang, Yujing
Sun, Yajing
Hu, Yue
Wu, Qi
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1097 - 1103
[26] Gated Multi-modal Fusion with Cross-modal Contrastive Learning for Video Question Answering
Lyu, Chenyang
Li, Wenxi
Ji, Tianbo
Zhou, Liting
Gurrin, Cathal
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 427 - 438
[27] Enhancing Visual Question Answering with Prompt-based Learning: A Cross-modal Approach for Deep Semantic Understanding
Zhu, Shuaiyu
Peng, Shuo
Chen, Shengbo
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ALGORITHMS, SOFTWARE ENGINEERING, AND NETWORK SECURITY, ASENS 2024, 2024, : 713 - 717
[28] CroMIC-QA: The Cross-Modal Information Complementation Based Question Answering
Qian, Shun
Liu, Bingquan
Sun, Chengjie
Xu, Zhen
Ma, Lin
Wang, Baoxun
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8348 - 8359
[29] VCD: Visual Causality Discovery for Cross-Modal Question Reasoning
Liu, Yang
Tan, Ying
Luo, Jingzhou
Chen, Weixing
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 309 - 322
[30] Robust visual question answering via semantic cross modal augmentation
Mashrur, Akib
Luo, Wei
Zaidi, Nayyar A.
Robles-Kelly, Antonio
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238

← 1 2 3 4 5 →