LIT-4-RSVQA: LIGHTWEIGHT TRANSFORMER-BASED VISUAL QUESTION ANSWERING IN REMOTE SENSING

被引：8

作者：

Hackel, Leonard ^{[1
,3
]}

Clasen, Kai Norman ^{[1
]}

Ravanbakhsh, Mahdyar ^{[2
]}

Demir, Beguem ^{[1
,3
]}

机构：

[1] Tech Univ Berlin, Berlin, Germany

[2] Zalando SE, Berlin, Germany

[3] BIFOLD Berlin Inst Foundat Learning & Data, Berlin, Germany

来源：

IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM | 2023年

基金：

欧洲研究理事会;

关键词：

visual question answering; natural language processing; lightweight transformer; remote sensing;

D O I：

10.1109/IGARSS52108.2023.10281674

中图分类号：

P [天文学、地球科学];

学科分类号：

07 ;

摘要：

Visual question answering (VQA) methods in remote sensing (RS) aim to answer natural language questions with respect to an RS image. Most of the existing methods require a large amount of computational resources, which limits their application in operational scenarios in RS. To address this issue, in this paper we present an effective lightweight transformer-based VQA in RS (LiT-4-RSVQA) architecture for efficient and accurate VQA in RS. Our architecture consists of: i) a lightweight text encoder module; ii) a lightweight image encoder module; iii) a fusion module; and iv) a classification module. The experimental results obtained on a VQA benchmark dataset demonstrate that our proposed LiT-4-RSVQA architecture provides accurate VQA results while significantly reducing the computational requirements on the executing hardware.

引用

页码：2231 / 2234

页数：4

共 17 条

[1]

Ali A., 2021, Adv. Neural Inf. Process. Syst., V34, P20014, DOI DOI 10.48550/ARXIV.2106.09681

[2] LANGUAGE TRANSFORMERS FOR REMOTE SENSING VISUAL QUESTION ANSWERING [J].

Chappuis, Christel ;

Mendez, Vincent ;

Walt, Eliot ;

Lobry, Sylvain ;

Le Saux, Bertrand ;

Tuia, Devis .

2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, :4855-4858

[3]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[4]

Dosovitskiy Alexey., 2021, PROC INT C LEARN REP, P2021

[5] MMBERT: Multimodal BERT Pretraining for Improved Medical VQA [J].

Khare, Yash ;

Bagal, Viraj ;

Mathew, Minesh ;

Devi, Adithi ;

Priyakumar, U. Deva ;

Jawahar, C., V .

2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, :1033-1036

[6]

Li L. H., 2019, 190803557 ARXIV

[7] RSVQA MEETS BIGEARTHNET: A NEW, LARGE-SCALE, VISUAL QUESTION ANSWERING DATASET FOR REMOTE SENSING [J].

Lobry, Sylvain ;

Demir, Begiim ;

Tuia, Devis .

2021 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM IGARSS, 2021, :1218-1221

[8] RSVQA: Visual Question Answering for Remote Sensing Data [J].

Lobry, Sylvain ;

Marcos, Diego ;

Murray, Jesse ;

Tuia, Devis .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (12) :8555-8566

[9]

Mehta S., 2022, ICLR

[10]

Siebert T., 2022, SPIE IMAGE SIGNAL PR

← 1 2 →