A Systematic Literature Review on Using the Encoder-Decoder Models for Image Captioning in English and Arabic Languages

被引:6
作者
Alsayed, Ashwaq [1 ]
Arif, Muhammad [1 ]
Qadah, Thamir M. [1 ]
Alotaibi, Saud [2 ]
机构
[1] Umm Al Qura Univ, Comp Sci Dept, Mecca 24230, Saudi Arabia
[2] Umm Al Qura Univ, Informat Syst Dept, Mecca 24230, Saudi Arabia
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 19期
关键词
image captioning; Arabic image captioning; BLEU score; transformer; feature extraction;
D O I
10.3390/app131910894
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
With the explosion of visual content on the Internet, creating captions for images has become a necessary task and an exciting topic for many researchers. Furthermore, image captioning is becoming increasingly important as the number of people utilizing social media platforms grows. While there is extensive research on English image captioning (EIC), studies focusing on image captioning in other languages, especially Arabic, are limited. There has also yet to be an attempt to survey Arabic image captioning (AIC) systematically. This research aims to systematically survey encoder-decoder EIC while considering the following aspects: visual model, language model, loss functions, datasets, evaluation metrics, model comparison, and adaptability to the Arabic language. A systematic review of the literature on EIC and AIC approaches published in the past nine years (2015-2023) from well-known databases (Google Scholar, ScienceDirect, IEEE Xplore) is undertaken. We have identified 52 primary English and Arabic studies relevant to our objectives (The number of articles on Arabic captioning is 11, and the rest are for the English language). The literature review shows that applying the English-specific models to the Arabic language is possible, with the use of a high-quality Arabic database and following the appropriate preprocessing. Moreover, we discuss some limitations and ideas to solve them as a future direction.
引用
收藏
页数:31
相关论文
共 114 条
[1]  
Abdelali A., 2016, P C N AM CHAPT ASS C, P11, DOI [DOI 10.18653/V1/N16, 10.18653/v1/N16-3003]
[2]   AraCap: A hybrid deep learning architecture for Arabic Image Captioning [J].
Afyouni, Imad ;
Azhar, Imtinan ;
Elnagar, Ashraf .
AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 :382-389
[3]  
Ai.facebook, Facebook Machine Translation
[4]  
Al-Muzaini HA, 2018, INT J ADV COMPUT SC, V9, P67
[5]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[6]   SPICE: Semantic Propositional Image Caption Evaluation [J].
Anderson, Peter ;
Fernando, Basura ;
Johnson, Mark ;
Gould, Stephen .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :382-398
[7]  
[Anonymous], Asian Languages-The Origin and Overview of Major Languages
[8]  
[Anonymous], Ultra Edit Smart Translator
[9]  
[Anonymous], Al-Jazeera News
[10]  
Antoun W, 2021, Arxiv, DOI arXiv:2003.00104