Lighting Search Algorithm With Convolutional Neural Network-Based Image Captioning System for Natural Language Processing

被引:1
作者
Alnashwan, Rana Othman [1 ]
Chelloug, Samia Allaoua [1 ]
Almalki, Nabil Sharaf [2 ]
Issaoui, Imene [3 ]
Motwakel, Abdelwahed [4 ]
Sayed, Ahmed [5 ]
机构
[1] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, POB 84428, Riyadh 11671, Saudi Arabia
[2] King Saud Univ, Coll Educ, Dept Special Educ, Riyadh 12372, Saudi Arabia
[3] Qassim Univ, Appl Coll, Unit Sci Res, Buraydah 52571, Saudi Arabia
[4] Prince Sattam Bin Abdulaziz Univ, Coll Business Adm Hawtat Bani Tamim, Dept Informat Syst, Al Kharj 16278, Saudi Arabia
[5] Future Univ Egypt, Res Ctr, New Cairo 11835, Egypt
关键词
Feature extraction; Convolutional neural networks; Visualization; Decoding; Natural language processing; Tuning; Deep convolutional neural network; natural language processing; image captioning; machine learning; hyperparameter tuning;
D O I
10.1109/ACCESS.2023.3342703
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, deep learning models have become more prominent due to their tremendous performance for real-time tasks like face recognition, object detection, natural language processing (NLP), instance segmentation, image classification, gesture recognition, and video classification. Image captioning is one of the critical tasks in NLP and computer vision (CV). It completes conversion from image to text; specifically, the model produces description text automatically based on the input images. In this aspect, this article develops a Lighting Search Algorithm (LSA) with a Hybrid Convolutional Neural Network Image Captioning System (LSAHCNN-ICS) for NLP. This introduced LSAHCNN-ICS system develops an end-to-end model which employs convolutional neural network (CNN) based ShuffleNet as an encoder and HCNN as a decoder. At the encoding part, the ShuffleNet model derives feature descriptors of the image. Besides, in the decoding part, the description of text can be generated using the proposed hybrid convolutional neural network (HCNN) model. To achieve improved captioning results, the LSA is applied as a hyperparameter tuning strategy, representing the innovation of the study. The simulation analysis of the presented LSAHCNN-ICS technique is performed on a benchmark database, and the obtained results demonstrated the enhanced outcomes of the LSAHCNN-ICS algorithm over other recent methods with maximum Consensus-based Image Description Evaluation (CIDEr Code) of 43.60, 59.54, and 135.14 on Flickr8k, Flickr30k, and MSCOCO datasets correspondingly.
引用
收藏
页码:142643 / 142651
页数:9
相关论文
共 42 条
[1]   Metaheuristics Optimization with Deep Learning Enabled Automated Image Captioning System [J].
Al Duhayyim, Mesfer ;
Alazwari, Sana ;
Mengash, Hanan Abdullah ;
Marzouk, Radwa ;
Alzahrani, Jaber S. ;
Mahgoub, Hany ;
Althukair, Fahd ;
Salama, Ahmed S. .
APPLIED SCIENCES-BASEL, 2022, 12 (15)
[2]   Image captioning model using attention and object features to mimic human image understanding [J].
Al-Malla, Muhammad Abdelhadie ;
Jafar, Assef ;
Ghneim, Nada .
JOURNAL OF BIG DATA, 2022, 9 (01)
[3]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[4]  
[Anonymous], Flickr Image Dataset
[5]  
[Anonymous], 2022, Int. J. Speech Technol., V52, P14711
[6]  
[Anonymous], 2019, Int. J. Distrib.Sensor Netw., V15
[7]  
[Anonymous], 2022, Int. J. Multimedia Inf. Retr., V11, P149
[8]  
[Anonymous], FLICKR 8K DATASET
[9]  
Atliha D., 2020, IEEE OC C EL EL INF, P1
[10]   Boosting convolutional image captioning with semantic content and visual relationship [J].
Bai, Cong ;
Zheng, Anqi ;
Huang, Yuan ;
Pan, Xiang ;
Chen, Nan .
DISPLAYS, 2021, 70