A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering

被引:1
作者
Zhang, Zixiao [1 ,2 ]
Jiao, Licheng [1 ,2 ]
Li, Lingling [1 ,2 ]
Liu, Xu [1 ,2 ]
Chen, Puhua [1 ,2 ]
Liu, Fang [1 ,2 ]
Li, Yuxuan [1 ,2 ]
Guo, Zhicheng [1 ,2 ]
机构
[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Int Res Ctr Intelligent Percept & Computat, Minist Educ,Joint Int Res Lab Intelligent Percept, Xian 710071, Shaanxi, Peoples R China
[2] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Shaanxi, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2023年 / 61卷
基金
中国国家自然科学基金;
关键词
Visualization; Remote sensing; Cognition; Task analysis; Geospatial analysis; Semantics; Question answering (information retrieval); Attention mechanism; multiscale representation; relational reasoning; visual question answering on remote sensing (RSVQA);
D O I
10.1109/TGRS.2023.3237606
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
For visual question answering on remote sensing (RSVQA), current methods scarcely consider geospatial objects typically with large-scale differences and positional sensitive properties. Besides, modeling and reasoning the relationships between entities have rarely been explored, which leads to one-sided and inaccurate answer predictions. In this article, a novel method called spatial hierarchical reasoning network (SHRNet) is proposed, which endows a remote sensing (RS) visual question answering (VQA) system with enhanced visual-spatial reasoning capability. Specifically, a hash-based spatial multiscale visual representation module is first designed to encode multiscale visual features embedded with spatial positional information. Then, spatial hierarchical reasoning is conducted to learn the high-order inner group object relations across multiple scales under the guidance of linguistic cues. Finally, a visual-question (VQ) interaction module is employed to learn an effective image-text joint embedding for the final answer predicting. Experimental results on three public RS VQA datasets confirm the effectiveness and superiority of our model SHRNet.
引用
收藏
页数:15
相关论文
共 63 条
[21]  
Kim JH, 2018, ADV NEUR IN, V31
[22]  
Kiros R., 2015, P ADV NEUR INF PROC, V28, P1
[23]   Multitask Semantic Boundary Awareness Network for Remote Sensing Image Segmentation [J].
Li, Aijin ;
Jiao, Licheng ;
Zhu, Hao ;
Li, Lingling ;
Liu, Fang .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[24]   Adaptive Multiscale Deep Fusion Residual Network for Remote Sensing Image Classification [J].
Li, Ge ;
Li, Lingling ;
Zhu, Hao ;
Liu, Xu ;
Jiao, Licheng .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (11) :8506-8521
[25]   Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning [J].
Li, Yunpeng ;
Zhang, Xiangrong ;
Gu, Jing ;
Li, Chen ;
Wang, Xin ;
Tang, Xu ;
Jiao, Licheng .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[26]   Feature Pyramid Networks for Object Detection [J].
Lin, Tsung-Yi ;
Dollar, Piotr ;
Girshick, Ross ;
He, Kaiming ;
Hariharan, Bharath ;
Belongie, Serge .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :936-944
[27]   GAFnet: Group Attention Fusion Network for PAN and MS Image High-Resolution Classification [J].
Liu, Xu ;
Li, Lingling ;
Liu, Fang ;
Hou, Biao ;
Yang, Shuyuan ;
Jiao, Licheng .
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) :10556-10569
[28]   Deep Multiview Union Learning Network for Multisource Image Classification [J].
Liu, Xu ;
Jiao, Licheng ;
Li, Lingling ;
Cheng, Lin ;
Liu, Fang ;
Yang, Shuyuan ;
Hou, Biao .
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) :4534-4546
[29]   RSVQA: Visual Question Answering for Remote Sensing Data [J].
Lobry, Sylvain ;
Marcos, Diego ;
Murray, Jesse ;
Tuia, Devis .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (12) :8555-8566
[30]   FactSeg: Foreground Activation-Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery [J].
Ma, Ailong ;
Wang, Junjue ;
Zhong, Yanfei ;
Zheng, Zhuo .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60