A Comparative Study on Deep CNN Visual Encoders for Image Captioning

被引:0
作者
Arun, M. [1 ]
Arivazhagan, S. [1 ]
Harinisri, R. [1 ]
Raghavi, P. S. [1 ]
机构
[1] Mepco Schlenk Engn Coll, Dept Elect & Commun Engn, Sivakasi, Tamil Nadu, India
来源
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III | 2024年 / 2011卷
关键词
Image Captioning; Flickr8K; Visual Encoding; BLEU; Flickr30K;
D O I
10.1007/978-3-031-58535-7_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Captioning an image is the process of describing it with syntactically and semantically meaningful terms. An image caption generator is developed by the integration of computer vision and natural language processing technology. Despite the fact that numerous techniques for generating image captions have been developed, the result is inadequate and the need for research in this area is still a demanding topic. The human process of describing any image is by seeing, focusing and captioning, which is equivalent that of feature representation, visual encoding and language generation for the image captioning systems. This study presents the construction of a simple deep learning-based image captioning model and investigates the efficacy of different visual encoding methods employed in the model. We have analyzed and compared the performance of six different pre-trained CNN visual encoding models using Bilingual Evaluation Understudy (BLEU) scores.
引用
收藏
页码:14 / 26
页数:13
相关论文
共 27 条
[1]  
Amritkar C, 2018, 2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA)
[2]  
Anitha Kumari K., 2020, P INT C ART INT SMAR, P679
[3]   A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets [J].
Bayoudh, Khaled ;
Knani, Raja ;
Hamdaoui, Faycal ;
Mtibaa, Abdellatif .
VISUAL COMPUTER, 2022, 38 (08) :2939-2970
[4]  
Cho K., 2014, P 2014 C EMP METH NA, DOI [10.3115/v1/d14-1179, DOI 10.48550/ARIV.1406.1078]
[5]  
Devlin J, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, P100
[6]   Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J].
Donahue, Jeff ;
Hendricks, Lisa Anne ;
Rohrbach, Marcus ;
Venugopalan, Subhashini ;
Guadarrama, Sergio ;
Saenko, Kate ;
Darrell, Trevor .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :677-691
[7]  
Fang H, 2015, PROC CVPR IEEE, P1473, DOI 10.1109/CVPR.2015.7298754
[8]   Every Picture Tells a Story: Generating Sentences from Images [J].
Farhadi, Ali ;
Hejrati, Mohsen ;
Sadeghi, Mohammad Amin ;
Young, Peter ;
Rashtchian, Cyrus ;
Hockenmaier, Julia ;
Forsyth, David .
COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 :15-+
[9]  
Gong YC, 2014, LECT NOTES COMPUT SC, V8692, P529, DOI 10.1007/978-3-319-10593-2_35
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778