LIT-4-RSVQA: LIGHTWEIGHT TRANSFORMER-BASED VISUAL QUESTION ANSWERING IN REMOTE SENSING

被引:8
作者
Hackel, Leonard [1 ,3 ]
Clasen, Kai Norman [1 ]
Ravanbakhsh, Mahdyar [2 ]
Demir, Beguem [1 ,3 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
[2] Zalando SE, Berlin, Germany
[3] BIFOLD Berlin Inst Foundat Learning & Data, Berlin, Germany
来源
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM | 2023年
基金
欧洲研究理事会;
关键词
visual question answering; natural language processing; lightweight transformer; remote sensing;
D O I
10.1109/IGARSS52108.2023.10281674
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
Visual question answering (VQA) methods in remote sensing (RS) aim to answer natural language questions with respect to an RS image. Most of the existing methods require a large amount of computational resources, which limits their application in operational scenarios in RS. To address this issue, in this paper we present an effective lightweight transformer-based VQA in RS (LiT-4-RSVQA) architecture for efficient and accurate VQA in RS. Our architecture consists of: i) a lightweight text encoder module; ii) a lightweight image encoder module; iii) a fusion module; and iv) a classification module. The experimental results obtained on a VQA benchmark dataset demonstrate that our proposed LiT-4-RSVQA architecture provides accurate VQA results while significantly reducing the computational requirements on the executing hardware.
引用
收藏
页码:2231 / 2234
页数:4
相关论文
共 17 条
[1]  
Ali A., 2021, Adv. Neural Inf. Process. Syst., V34, P20014, DOI DOI 10.48550/ARXIV.2106.09681
[2]   LANGUAGE TRANSFORMERS FOR REMOTE SENSING VISUAL QUESTION ANSWERING [J].
Chappuis, Christel ;
Mendez, Vincent ;
Walt, Eliot ;
Lobry, Sylvain ;
Le Saux, Bertrand ;
Tuia, Devis .
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, :4855-4858
[3]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[4]  
Dosovitskiy Alexey., 2021, PROC INT C LEARN REP, P2021
[5]   MMBERT: Multimodal BERT Pretraining for Improved Medical VQA [J].
Khare, Yash ;
Bagal, Viraj ;
Mathew, Minesh ;
Devi, Adithi ;
Priyakumar, U. Deva ;
Jawahar, C., V .
2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, :1033-1036
[6]  
Li L. H., 2019, 190803557 ARXIV
[7]   RSVQA MEETS BIGEARTHNET: A NEW, LARGE-SCALE, VISUAL QUESTION ANSWERING DATASET FOR REMOTE SENSING [J].
Lobry, Sylvain ;
Demir, Begiim ;
Tuia, Devis .
2021 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM IGARSS, 2021, :1218-1221
[8]   RSVQA: Visual Question Answering for Remote Sensing Data [J].
Lobry, Sylvain ;
Marcos, Diego ;
Murray, Jesse ;
Tuia, Devis .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (12) :8555-8566
[9]  
Mehta S., 2022, ICLR
[10]  
Siebert T., 2022, SPIE IMAGE SIGNAL PR