Generating Image Captions Using Bahdanau Attention Mechanism and Transfer Learning

被引:39
作者
Ayoub, Shahnawaz [1 ]
Gulzar, Yonis [2 ]
Reegu, Faheem Ahmad [3 ]
Turaev, Sherzod [4 ]
机构
[1] Shri Venkateshwara Univ, Dept Comp Sci & Engn, NH-24, Gajraula 244236, Uttar Pradesh, India
[2] King Faisal Univ, Coll Business Adm, Dept Management Informat Syst, Al Hasa 31982, Saudi Arabia
[3] Jazan Univ, Dept Comp Sci & Informat Technol, Jazan 45142, Saudi Arabia
[4] United Arab Emirates Univ, Coll Informat Technol, Dept Comp Sci & Software Engn, Al Ain 15551, U Arab Emirates
来源
SYMMETRY-BASEL | 2022年 / 14卷 / 12期
关键词
image captioning; convolutional neural network; Bahdanau attention mechanism; natural language process;
D O I
10.3390/sym14122681
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Automatic image caption prediction is a challenging task in natural language processing. Most of the researchers have used the convolutional neural network as an encoder and decoder. However, an accurate image caption prediction requires a model to understand the semantic relationship that exists between the various objects present in an image. The attention mechanism performs a linear combination of encoder and decoder states. It emphasizes the semantic information present in the caption with the visual information present in an image. In this paper, we incorporated the Bahdanau attention mechanism with two pre-trained convolutional neural networks-Vector Geometry Group and InceptionV3-to predict the captions of a given image. The two pre-trained models are used as encoders and the Recurrent neural network is used as a decoder. With the help of the attention mechanism, the two encoders are able to provide semantic context information to the decoder and achieve a bilingual evaluation understudy score of 62.5. Our main goal is to compare the performance of the two pre-trained models incorporated with the Bahdanau attention mechanism on the same dataset.
引用
收藏
页数:19
相关论文
共 46 条
[1]   Investigation of Machine Learning Methods for Early Prediction of Neurodevelopmental Disorders in Children [J].
Alam, Sumbul ;
Raja, Pravinth ;
Gulzar, Yonis .
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
[2]   A Deep Learning-Based Model for Date Fruit Classification [J].
Albarrak, Khalied ;
Gulzar, Yonis ;
Hamid, Yasir ;
Mehmood, Abid ;
Soomro, Arjumand Bano .
SUSTAINABILITY, 2022, 14 (10)
[3]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[4]   Convolutional Image Captioning [J].
Aneja, Jyoti ;
Deshpande, Aditya ;
Schwing, Alexander G. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5561-5570
[5]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[6]   NB-CNN: Deep Learning-Based Crack Detection Using Convolutional Neural Network and Naive Bayes Data Fusion [J].
Chen, Fu-Chen ;
Jahanshahi, Mohammad R. .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2018, 65 (05) :4392-4400
[7]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[8]   Neural Image Caption Generation with Weighted Training and Reference [J].
Ding, Guiguang ;
Chen, Minghai ;
Zhao, Sicheng ;
Chen, Hui ;
Han, Jungong ;
Liu, Qiang .
COGNITIVE COMPUTATION, 2019, 11 (06) :763-777
[9]   Skin Lesion Segmentation Based on Vision Transformers and Convolutional Neural Networks-A Comparative Study [J].
Gulzar, Yonis ;
Khan, Sumeer Ahmad .
APPLIED SCIENCES-BASEL, 2022, 12 (12)
[10]   A Convolution Neural Network-Based Seed Classification System [J].
Gulzar, Yonis ;
Hamid, Yasir ;
Soomro, Arjumand Bano ;
Alwan, Ali A. ;
Journaux, Ludovic .
SYMMETRY-BASEL, 2020, 12 (12)