A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering

被引:1
作者
Zhang, Zixiao [1 ,2 ]
Jiao, Licheng [1 ,2 ]
Li, Lingling [1 ,2 ]
Liu, Xu [1 ,2 ]
Chen, Puhua [1 ,2 ]
Liu, Fang [1 ,2 ]
Li, Yuxuan [1 ,2 ]
Guo, Zhicheng [1 ,2 ]
机构
[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Int Res Ctr Intelligent Percept & Computat, Minist Educ,Joint Int Res Lab Intelligent Percept, Xian 710071, Shaanxi, Peoples R China
[2] Xidian Univ, Sch Artificial Intelligence, Xian 710071, Shaanxi, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2023年 / 61卷
基金
中国国家自然科学基金;
关键词
Visualization; Remote sensing; Cognition; Task analysis; Geospatial analysis; Semantics; Question answering (information retrieval); Attention mechanism; multiscale representation; relational reasoning; visual question answering on remote sensing (RSVQA);
D O I
10.1109/TGRS.2023.3237606
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
For visual question answering on remote sensing (RSVQA), current methods scarcely consider geospatial objects typically with large-scale differences and positional sensitive properties. Besides, modeling and reasoning the relationships between entities have rarely been explored, which leads to one-sided and inaccurate answer predictions. In this article, a novel method called spatial hierarchical reasoning network (SHRNet) is proposed, which endows a remote sensing (RS) visual question answering (VQA) system with enhanced visual-spatial reasoning capability. Specifically, a hash-based spatial multiscale visual representation module is first designed to encode multiscale visual features embedded with spatial positional information. Then, spatial hierarchical reasoning is conducted to learn the high-order inner group object relations across multiple scales under the guidance of linguistic cues. Finally, a visual-question (VQ) interaction module is employed to learn an effective image-text joint embedding for the final answer predicting. Experimental results on three public RS VQA datasets confirm the effectiveness and superiority of our model SHRNet.
引用
收藏
页数:15
相关论文
共 63 条
[51]  
Yuan ZH, 2022, Arxiv, DOI arXiv:2112.06343
[52]   Weakly Supervised Learning Based on Coupled Convolutional Neural Networks for Aircraft Detection [J].
Zhang, Fan ;
Du, Bo ;
Zhang, Liangpei ;
Xu, Miaozhong .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2016, 54 (09) :5553-5563
[53]   Saliency-Guided Unsupervised Feature Learning for Scene Classification [J].
Zhang, Fan ;
Du, Bo ;
Zhang, Liangpei .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2015, 53 (04) :2175-2184
[54]   Laplacian Feature Pyramid Network for Object Detection in VHR Optical Remote Sensing Images [J].
Zhang, Wenhua ;
Jiao, Licheng ;
Li, Yuxuan ;
Huang, Zhongjian ;
Wang, Haoran .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[55]   Hierarchical and Robust Convolutional Neural Network for Very High-Resolution Remote Sensing Object Detection [J].
Zhang, Yuanlin ;
Yuan, Yuan ;
Feng, Yachuang ;
Lu, Xiaoqiang .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (08) :5535-5548
[56]   A 3-D Storm Motion Estimation Method Based on Point Cloud Learning and Doppler Weather Radar Data [J].
Zhang, Zhuoyu ;
He, Zhenghao ;
Yang, Jin ;
Liu, Yuchen ;
Bao, Riyang ;
Gao, Shuping .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[57]   High-Resolution Remote Sensing Image Captioning Based on Structured Attention [J].
Zhao, Rui ;
Shi, Zhenwei ;
Zou, Zhengxia .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[58]   Development of a Gray-Level Co-Occurrence Matrix-Based Texture Orientation Estimation Method and Its Application in Sea Surface Wind Direction Retrieval From SAR Imagery [J].
Zheng, Gang ;
Li, Xiaofeng ;
Zhou, Lizhang ;
Yang, Jingsong ;
Ren, Lin ;
Chen, Peng ;
Zhang, Huaguo ;
Lou, Xiulin .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (09) :5244-5260
[59]  
Zheng XT, 2022, IEEE T GEOSCI REMOTE, V60, DOI [10.1109/TGRS.2021.3079918, 10.1109/TGRS.2021.3116147]
[60]  
Zhou BL, 2015, Arxiv, DOI arXiv:1512.02167