A performance analysis of transformer-based deep learning models for Arabic image captioning

被引:2
|
作者
Alsayed, Ashwaq [1 ]
Qadah, Thamir M. [1 ]
Arif, Muhammad [1 ]
机构
[1] Umm Al Qura Univ, Coll Comp & Informat Syst, Comp Sci Dept, Mecca, Saudi Arabia
关键词
Image captioning; Arabic image captioning; Transformer model; Performance analysis and evaluation; Deep learning; Machine learning; Arabic technologies;
D O I
10.1016/j.jksuci.2023.101750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper focuses on understanding the factors that affect the performance of machine learning models performing Arabic image captioning (AIC). In particular, we focus on transformer-based models for AIC and study the impact of various text-preprocessing methods: CAMeL Tools, ArabertPreprocessor, and Stanza. Our study shows that using CAMeL Tools to preprocess text labels improves the AIC performance by up to 34-92% in the BLEU-4 score. In addition, we study the impact of image recognition models. Our results show that ResNet152 is better than EfficientNet-B0 and can improve BLEU scores performance by 9-11%. Furthermore, we investigate the impact of different datasets on the overall AIC performance and build an extended version of the Arabic Flickr8k dataset. Using the extended version improves the BLEU-4 score of the AIC model by up to 148%. Finally, utilizing our results, we build a model that significantly outperforms the state-of-the-art proposals in AIC by up to 196-379% in the BLUE-4 score. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Improving scene text image captioning using transformer-based multilevel attention
    Srivastava, Swati
    Sharma, Himanshu
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (03)
  • [32] Video captioning using transformer-based GAN
    Babavalian M.R.
    Kiani K.
    Multimedia Tools and Applications, 2025, 84 (10) : 7091 - 7113
  • [33] On the robustness of arabic aspect-based sentiment analysis: A comprehensive exploration of transformer-based models
    Almasaud, Alanod
    Al-Baity, Heyam H.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (10)
  • [34] An Analysis of the Use of Feed-Forward Sub-Modules for Transformer-Based Image Captioning Tasks
    Osolo, Raymond Ian
    Yang, Zhan
    Long, Jun
    APPLIED SCIENCES-BASEL, 2021, 11 (24):
  • [35] Image Alone Are Not Enough: A General Semantic-Augmented Transformer-Based Framework for Image Captioning
    Liu, Jiawei
    Lin, Xin
    He, Liang
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [36] A transformer-based deep learning model for Persian moral sentiment analysis
    Karami, Behnam
    Bakouie, Fatemeh
    Gharibzadeh, Shahriar
    JOURNAL OF INFORMATION SCIENCE, 2023,
  • [37] Optimizing Performance of Transformer-based Models for Fetal Brain MR Image Segmentation
    Pecco, Nicoll
    Della Rosa, Pasquale Anthony
    Canini, Matteo
    Nocera, Gianluca
    Scifo, Paola
    Cavoretto, Paolo Ivo
    Candiani, Massimo
    Falini, Andrea
    Castellano, Antonella
    Baldoli, Cristina
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2024, 6 (06)
  • [38] Transformer-based Arabic Dialect Identification
    Lin, Wanqiu
    Madhavi, Maulik
    Das, Rohan Kumar
    Li, Haizhou
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 192 - 196
  • [39] Challenging the Transformer-based models with a Classical Arabic dataset: Quran and Hadith
    Altammami, Shatha
    Atwell, Eric
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1462 - 1471
  • [40] Transformer based Multitask Learning for Image Captioning and Object Detection
    Basak, Debolena
    Srijith, P. K.
    Desarkar, Maunendra Sankar
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PAKDD 2024, 2024, 14646 : 260 - 272