A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering

被引：1

作者：

Zhang, Zixiao ^{[1
,2
]}

Jiao, Licheng ^{[1
,2
]}

Li, Lingling ^{[1
,2
]}

Liu, Xu ^{[1
,2
]}

Chen, Puhua ^{[1
,2
]}

Liu, Fang ^{[1
,2
]}

Li, Yuxuan ^{[1
,2
]}

Guo, Zhicheng ^{[1
,2
]}

机构：

[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Int Res Ctr Intelligent Percept & Computat, Minist Educ,Joint Int Res Lab Intelligent Percept, Xian 710071, Shaanxi, Peoples R China

[2] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Shaanxi, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2023年 / 61卷

基金：

中国国家自然科学基金;

关键词：

Visualization; Remote sensing; Cognition; Task analysis; Geospatial analysis; Semantics; Question answering (information retrieval); Attention mechanism; multiscale representation; relational reasoning; visual question answering on remote sensing (RSVQA);

D O I：

10.1109/TGRS.2023.3237606

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

For visual question answering on remote sensing (RSVQA), current methods scarcely consider geospatial objects typically with large-scale differences and positional sensitive properties. Besides, modeling and reasoning the relationships between entities have rarely been explored, which leads to one-sided and inaccurate answer predictions. In this article, a novel method called spatial hierarchical reasoning network (SHRNet) is proposed, which endows a remote sensing (RS) visual question answering (VQA) system with enhanced visual-spatial reasoning capability. Specifically, a hash-based spatial multiscale visual representation module is first designed to encode multiscale visual features embedded with spatial positional information. Then, spatial hierarchical reasoning is conducted to learn the high-order inner group object relations across multiple scales under the guidance of linguistic cues. Finally, a visual-question (VQ) interaction module is employed to learn an effective image-text joint embedding for the final answer predicting. Experimental results on three public RS VQA datasets confirm the effectiveness and superiority of our model SHRNet.

引用

页数：15

共 63 条

[1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].

Anderson, Peter ;

He, Xiaodong ;

Buehler, Chris ;

Teney, Damien ;

Johnson, Mark ;

Gould, Stephen ;

Zhang, Lei .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086

[2] VQA: Visual Question Answering [J].

Antol, Stanislaw ;

Agrawal, Aishwarya ;

Lu, Jiasen ;

Mitchell, Margaret ;

Batra, Dhruv ;

Zitnick, C. Lawrence ;

Parikh, Devi .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433

[3] Bi-Modal Transformer-Based Approach for Visual Question Answering in Remote Sensing Imagery [J].

Bazi, Yakoub ;

Al Rahhal, Mohamad Mahmoud ;

Mekhalfi, Mohamed Lamine ;

Al Zuair, Mansour Abdulaziz ;

Melgani, Farid .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

[4]

Ben-Younes H, 2019, AAAI CONF ARTIF INTE, P8102

[5] MUTAN: Multimodal Tucker Fusion for Visual Question Answering [J].

Ben-younes, Hedi ;

Cadene, Remi ;

Cord, Matthieu ;

Thome, Nicolas .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2631-2639

[6] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[7]

Chappuis C., 2022, P IEEE CVF C COMP VI, P1372

[8] Remote Sensing Image Scene Classification: Benchmark and State of the Art [J].

Cheng, Gong ;

Han, Junwei ;

Lu, Xiaoqiang .

PROCEEDINGS OF THE IEEE, 2017, 105 (10) :1865-1883

[9] Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images [J].

Cheng, Gong ;

Zhou, Peicheng ;

Han, Junwei .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2016, 54 (12) :7405-7415

[10] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing [J].

Cheng, Qimin ;

Zhou, Yuzhuo ;

Fu, Peng ;

Xu, Yuan ;

Zhang, Liang .

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 :4284-4297

← 1 2 3 4 5 6 7 →