A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering

被引:1
作者
Zhang, Zixiao [1 ,2 ]
Jiao, Licheng [1 ,2 ]
Li, Lingling [1 ,2 ]
Liu, Xu [1 ,2 ]
Chen, Puhua [1 ,2 ]
Liu, Fang [1 ,2 ]
Li, Yuxuan [1 ,2 ]
Guo, Zhicheng [1 ,2 ]
机构
[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Int Res Ctr Intelligent Percept & Computat, Minist Educ,Joint Int Res Lab Intelligent Percept, Xian 710071, Shaanxi, Peoples R China
[2] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Shaanxi, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2023年 / 61卷
基金
中国国家自然科学基金;
关键词
Visualization; Remote sensing; Cognition; Task analysis; Geospatial analysis; Semantics; Question answering (information retrieval); Attention mechanism; multiscale representation; relational reasoning; visual question answering on remote sensing (RSVQA);
D O I
10.1109/TGRS.2023.3237606
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
For visual question answering on remote sensing (RSVQA), current methods scarcely consider geospatial objects typically with large-scale differences and positional sensitive properties. Besides, modeling and reasoning the relationships between entities have rarely been explored, which leads to one-sided and inaccurate answer predictions. In this article, a novel method called spatial hierarchical reasoning network (SHRNet) is proposed, which endows a remote sensing (RS) visual question answering (VQA) system with enhanced visual-spatial reasoning capability. Specifically, a hash-based spatial multiscale visual representation module is first designed to encode multiscale visual features embedded with spatial positional information. Then, spatial hierarchical reasoning is conducted to learn the high-order inner group object relations across multiple scales under the guidance of linguistic cues. Finally, a visual-question (VQ) interaction module is employed to learn an effective image-text joint embedding for the final answer predicting. Experimental results on three public RS VQA datasets confirm the effectiveness and superiority of our model SHRNet.
引用
收藏
页数:15
相关论文
共 63 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[3]   Bi-Modal Transformer-Based Approach for Visual Question Answering in Remote Sensing Imagery [J].
Bazi, Yakoub ;
Al Rahhal, Mohamad Mahmoud ;
Mekhalfi, Mohamed Lamine ;
Al Zuair, Mansour Abdulaziz ;
Melgani, Farid .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[4]  
Ben-Younes H, 2019, AAAI CONF ARTIF INTE, P8102
[5]   MUTAN: Multimodal Tucker Fusion for Visual Question Answering [J].
Ben-younes, Hedi ;
Cadene, Remi ;
Cord, Matthieu ;
Thome, Nicolas .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2631-2639
[6]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[7]  
Chappuis C., 2022, P IEEE CVF C COMP VI, P1372
[8]   Remote Sensing Image Scene Classification: Benchmark and State of the Art [J].
Cheng, Gong ;
Han, Junwei ;
Lu, Xiaoqiang .
PROCEEDINGS OF THE IEEE, 2017, 105 (10) :1865-1883
[9]   Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images [J].
Cheng, Gong ;
Zhou, Peicheng ;
Han, Junwei .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2016, 54 (12) :7405-7415
[10]   A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing [J].
Cheng, Qimin ;
Zhou, Yuzhuo ;
Fu, Peng ;
Xu, Yuan ;
Zhang, Liang .
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 :4284-4297