AraCap: A hybrid deep learning architecture for Arabic Image Captioning

被引:8
作者
Afyouni, Imad [1 ]
Azhar, Imtinan [1 ]
Elnagar, Ashraf [1 ]
机构
[1] Univ Sharjah, Dept Comp Sci, Sharjah, U Arab Emirates
来源
AI IN COMPUTATIONAL LINGUISTICS | 2021年 / 189卷
关键词
Image Captioning; Arabic Language; Deep Learning; Object Detection; Attention Mechanism;
D O I
10.1016/j.procs.2021.05.108
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic captioning of images no only enrich multimedia content with descriptive features, but also helps in detecting patterns, trends, and events of interest. Particularly, Arabic Image Caption Generation is a very challenging topic in the machine learning field. This paper presents, AraCap, a hybrid object-based, attention-enriched image captioning architecture, with a focus on Arabic language. Three models are demonstrated, all of them are implemented and trained on COCO and Flickr30k datasets, and then tested by building an Arabic version of a subset of COCO dataset. The first model is an object-based captioner that can handle one or multiple detected objects. The second is a combined pipeline that uses both object detector and attention-based captioning; while the third one is based on a pure soft attention mechanism. The models are evaluated using multi-lingual semantic sentence similarity techniques to assess the generated captions accuracy against the actual ground truth captions. Results show that similarity scores for Arabic generated captions from all three proposed models outperformed the basic captioning technique. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:382 / 389
页数:8
相关论文
共 15 条
[1]   Deep learning for Arabic NLP: A survey [J].
Al-Ayyoub, Mahmoud ;
Nuseir, Aya ;
Alsmearat, Kholoud ;
Jararweh, Yaser ;
Gupta, Brij .
JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 26 :522-531
[2]  
Al-Muzaini HA, 2018, INT J ADV COMPUT SC, V9, P67
[3]  
Cheikh Moustapha, 2020, Learning and Intelligent Optimization. 14th International Conference, LION 14. Revised Selected Papers. Lecture Notes in Computer Science (LNCS 12096), P128, DOI 10.1007/978-3-030-53552-0_14
[4]  
Chen Xinlei, 2015, CORR
[5]   Resources and End-to-End Neural Network Models for Arabic Image Captioning [J].
ElJundi, Obeida ;
Dhaybi, Mohamad ;
Mokadam, Kotaiba ;
Hajj, Hazem ;
Asmar, Daniel .
PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, :233-241
[6]   A Comprehensive Survey of Deep Learning for Image Captioning [J].
Hossain, Md Zakir ;
Sohel, Ferdous ;
Shiratuddin, Mohd Fairuz ;
Laga, Hamid .
ACM COMPUTING SURVEYS, 2019, 51 (06)
[7]  
Jindal V, 2018, P AAAI C ART INT 32
[8]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[9]  
Mualla R., 2018, International Journal of Computer Science Trends and Technology (IJCST), V6, P205
[10]   Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models [J].
Plummer, Bryan A. ;
Wang, Liwei ;
Cervantes, Chris M. ;
Caicedo, Juan C. ;
Hockenmaier, Julia ;
Lazebnik, Svetlana .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :74-93