Enhancing THAI Image Captioning Performance using CNN and Bidirectional LSTM

被引：0

作者：

Tieancho, Witchaphon ^{[1
]}

Phumeechanya, Sopon ^{[1
]}

机构：

[1] Silpakorn Univ, Fac Engn & Ind Technol, Dept Elect Engn, Nakhon Pathom, Thailand

来源：

2024 21ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY, ECTI-CON 2024 | 2024年

关键词：

Thai captions; convolutional neural network; bidirectional LSTM; BLEU;

D O I：

10.1109/ECTI-CON60892.2024.10595011

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This research has designed a deep learning model to create Thai captions using a convolutional neural network (CNN) in VGG16 format to extract image features, and it is used to procreate captions using bidirectional LSTM. The data warehouse used for training and testing is Flickr8k, which combines customized traffic-related image and caption information. For the first set of data, that is Flickr8k, all subtitles had to be translated from English to Thai using Google Translate, and ways to deal with the data before training were to remove special characters to prevent the Thai language description from being distorted. Then, to evaluate the result of the captions the model produced compared to default captions, the BLEU metric was used to measure the score. The resulting average score was effective because it was higher than the compared models. The score values were paralleled up to 4 grams.

引用

页数：5

共 50 条

[21] Image Captioning using Hybrid LSTM-RNN with Deep Features
Deorukhkar, Kalpana Prasanna
Ket, Satish
SENSING AND IMAGING, 2022, 23 (01):
[22] Image Captioning with Visual-Semantic LSTM
Li, Nannan
Chen, Zhenzhong
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 793 - 799
[23] Constrained LSTM and Residual Attention for Image Captioning
Yang, Liang
Hu, Haifeng
Xing, Songlong
Lu, Xinlong
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (03)
[24] An Empirical Study of Language CNN for Image Captioning
Gu, Jiuxiang
Wang, Gang
Cai, Jianfei
Chen, Tsuhan
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1231 - 1240
[25] Bidirectional interactive alignment network for image captioning
Cao, Xinrong
Yan, Peixin
Hu, Rong
Li, Zuoyong
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[26] Cloud-Assisted Multiview Video Summarization Using CNN and Bidirectional LSTM
Hussain, Tanveer
Muhammad, Khan
Ullah, Amin
Cao, Zehong
Baik, Sung Wook
de Albuquerque, Victor Hugo C.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (01) : 77 - 86
[27] Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
Dhakal, Manish
Chhetri, Arman
Gupta, Aman Kumar
Lamichhane, Prabin
Pandey, Suraj
Shakya, Subarna
2022 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES, ICICT 2022, 2022, : 515 - 521
[28] Enhancing Electrical Load Prediction Using a Bidirectional LSTM Neural Network
Pavlatos, Christos
Makris, Evangelos
Fotis, Georgios
Vita, Vasiliki
Mladenov, Valeri
ELECTRONICS, 2023, 12 (22)
[29] Enhancing EEG signals classification using LSTM-CNN architecture
Omar, Swaleh M.
Kimwele, Michael
Olowolayemo, Akeem
Kaburu, Dennis M.
ENGINEERING REPORTS, 2024, 6 (09)
[30] IMAGE CAPTIONING WITH DEEP LSTM BASED ON SEQUENTIAL RESIDUAL
Xu, Kaisheng
Wang, Hanli
Tang, Pengjie
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 361 - 366

← 1 2 3 4 5 →