A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions

被引：0

作者：

Zhang, Ke ^{[1
,2
]}

Li, Peijie ^{[1
]}

Wang, Jianqiang ^{[1
]}

机构：

[1] North China Elect Power Univ, Dept Elect & Commun Engn, Baoding 071003, Peoples R China

[2] North China Elect Power Univ, Hebei Key Lab Power Internet Things Technol, Baoding 071003, Peoples R China

来源：

REMOTE SENSING | 2024年 / 16卷 / 21期

基金：

中国国家自然科学基金;

关键词：

remote sensing; image caption; encoder-decoder framework; attention mechanism; reinforcement learning; auxiliary task; large visual language model; few-shot learning; NETWORK; ATTENTION; AGGREGATION; ALGORITHMS; FUSION;

D O I：

10.3390/rs16214113

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder-decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.

引用

页数：45

共 218 条

[1] Rusu AA, 2019, Arxiv, DOI arXiv:1807.05960
[2] Mel Frequency Cepstral Coefficient and its Applications: A Review
Abdul, Zrar Kh.
Al-Talabani, Abdulbasit K. K.
[J]. IEEE ACCESS, 2022, 10 : 122136 - 122158
[3] Achiam OJ, 2023, Arxiv, DOI [arXiv:2303.08774, 10.48550/arXiv.2303.08774]
[4] Analysis on change detection techniques for remote sensing applications: A review
Afaq, Yasir
Manocha, Ankush
[J]. ECOLOGICAL INFORMATICS, 2021, 63
[5] Albawi S, 2017, I C ENG TECHNOL
[6] Allen-Zhu Z, 2021, Arxiv, DOI [arXiv:2012.09816, 10.48550/arXiv.2012.09816]
[7] Flood Detection with SAR: A Review of Techniques and Datasets
Amitrano, Donato
Di Martino, Gerardo
Di Simone, Alessio
Imperatore, Pasquale
[J]. REMOTE SENSING, 2024, 16 (04)
[8] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Anderson, Peter
He, Xiaodong
Buehler, Chris
Teney, Damien
Johnson, Mark
Gould, Stephen
Zhang, Lei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
[9] SPICE: Semantic Propositional Image Caption Evaluation
Anderson, Peter
Fernando, Basura
Johnson, Mark
Gould, Stephen
[J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
[10] Deep Reinforcement Learning A brief survey
Arulkumaran, Kai
Deisenroth, Marc Peter
Brundage, Miles
Bharath, Anil Anthony
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 26 - 38

← 1 2 3 4 5 6 7 8 9 10 →