A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions

被引:0
作者
Zhang, Ke [1 ,2 ]
Li, Peijie [1 ]
Wang, Jianqiang [1 ]
机构
[1] North China Elect Power Univ, Dept Elect & Commun Engn, Baoding 071003, Peoples R China
[2] North China Elect Power Univ, Hebei Key Lab Power Internet Things Technol, Baoding 071003, Peoples R China
基金
中国国家自然科学基金;
关键词
remote sensing; image caption; encoder-decoder framework; attention mechanism; reinforcement learning; auxiliary task; large visual language model; few-shot learning; NETWORK; ATTENTION; AGGREGATION; ALGORITHMS; FUSION;
D O I
10.3390/rs16214113
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder-decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.
引用
收藏
页数:45
相关论文
共 218 条
  • [1] Rusu AA, 2019, Arxiv, DOI arXiv:1807.05960
  • [2] Mel Frequency Cepstral Coefficient and its Applications: A Review
    Abdul, Zrar Kh.
    Al-Talabani, Abdulbasit K. K.
    [J]. IEEE ACCESS, 2022, 10 : 122136 - 122158
  • [3] Achiam OJ, 2023, Arxiv, DOI [arXiv:2303.08774, 10.48550/arXiv.2303.08774]
  • [4] Analysis on change detection techniques for remote sensing applications: A review
    Afaq, Yasir
    Manocha, Ankush
    [J]. ECOLOGICAL INFORMATICS, 2021, 63
  • [5] Albawi S, 2017, I C ENG TECHNOL
  • [6] Allen-Zhu Z, 2021, Arxiv, DOI [arXiv:2012.09816, 10.48550/arXiv.2012.09816]
  • [7] Flood Detection with SAR: A Review of Techniques and Datasets
    Amitrano, Donato
    Di Martino, Gerardo
    Di Simone, Alessio
    Imperatore, Pasquale
    [J]. REMOTE SENSING, 2024, 16 (04)
  • [8] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [9] SPICE: Semantic Propositional Image Caption Evaluation
    Anderson, Peter
    Fernando, Basura
    Johnson, Mark
    Gould, Stephen
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
  • [10] Deep Reinforcement Learning A brief survey
    Arulkumaran, Kai
    Deisenroth, Marc Peter
    Brundage, Miles
    Bharath, Anil Anthony
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 26 - 38