Exploring region features in remote sensing image captioning

被引:5
|
作者
Zhao, Kai [1 ]
Xiong, Wei [2 ]
机构
[1] Space Engn Univ, Beijing 101400, Peoples R China
[2] Space Engn Univ, Sci & Technol Complex Elect Syst Simulat Lab, Beijing 101400, Peoples R China
关键词
Transformer model; Image processing; Model training; Remote sensing image captioning;
D O I
10.1016/j.jag.2024.103672
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
Remote sensing image captioning (RSIC), an emerging field of cross -modal tasks, has become a popular research topic in recent years. Feature extraction underlies all RSIC tasks, with current tasks using grid features. Compared with grid features, region features provide object -level location -related information; however, these features have not been considered in the RSIC tasks. Therefore, this study examined the performance of region features on RSIC tasks. We generated region annotations based on published RSIC datasets to address the need for region -related datasets. We extracted region features according to the labeled data and proposed a Region Attention Transformer model. To solve the information loss problem owing to the region of interest pooling during region feature extraction, we proposed region -grid features and used geometry relationships for estimating correlations between different region features. We compared the performances of the models using grid and region features. The results showed that region features performed well in RSIC tasks, and region features forced the model to pay more attention to object regions when generating object -related words. This study describes a novel method of using features in RSIC tasks. Our region annotations are available at https://github.com/zk-1019/exploring.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] REMOTE SENSING IMAGE CAPTIONING WITH SVM-BASED DECODING
    Hoxha, Genc
    Melgani, Farid
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 6734 - 6737
  • [22] Truncation Cross Entropy Loss for Remote Sensing Image Captioning
    Li, Xuelong
    Zhang, Xueting
    Huang, Wei
    Wang, Qi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (06): : 5246 - 5257
  • [23] Sound Active Attention Framework for Remote Sensing Image Captioning
    Lu, Xiaoqiang
    Wang, Binqiang
    Zheng, Xiangtao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (03): : 1985 - 2000
  • [24] Multiscale Methods for Optical Remote-Sensing Image Captioning
    Ma, Xiaofeng
    Zhao, Rui
    Shi, Zhenwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (11) : 2001 - 2005
  • [25] Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Gu, Jing
    Li, Chen
    Wang, Xin
    Tang, Xu
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [26] Toward Remote Sensing Image Retrieval Under a Deep Image Captioning Perspective
    Hoxha, Genc
    Melgani, Farid
    Demir, Begum
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2020, 13 : 4462 - 4475
  • [27] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
    Yang, Cong
    Li, Zuchao
    Zhang, Lefei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
  • [28] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Yang, Yang
    Xiao, Liang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
  • [29] LAM: Remote Sensing Image Captioning with Label-Attention Mechanism
    Zhang, Zhengyuan
    Diao, Wenhui
    Zhang, Wenkai
    Yan, Menglong
    Gao, Xin
    Sun, Xian
    REMOTE SENSING, 2019, 11 (20)
  • [30] LEARNING REGION RESPONSE RANKING FEATURES FOR REMOTE SENSING IMAGE SCENE CLASSIFICATION
    Yang, Junyu
    Cheng, Gong
    Yao, Xiwen
    Han, Junwei
    Guo, Lei
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 529 - 532