Enhancing THAI Image Captioning Performance using CNN and Bidirectional LSTM

被引:0
|
作者
Tieancho, Witchaphon [1 ]
Phumeechanya, Sopon [1 ]
机构
[1] Silpakorn Univ, Fac Engn & Ind Technol, Dept Elect Engn, Nakhon Pathom, Thailand
来源
2024 21ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY, ECTI-CON 2024 | 2024年
关键词
Thai captions; convolutional neural network; bidirectional LSTM; BLEU;
D O I
10.1109/ECTI-CON60892.2024.10595011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This research has designed a deep learning model to create Thai captions using a convolutional neural network (CNN) in VGG16 format to extract image features, and it is used to procreate captions using bidirectional LSTM. The data warehouse used for training and testing is Flickr8k, which combines customized traffic-related image and caption information. For the first set of data, that is Flickr8k, all subtitles had to be translated from English to Thai using Google Translate, and ways to deal with the data before training were to remove special characters to prevent the Thai language description from being distorted. Then, to evaluate the result of the captions the model produced compared to default captions, the BLEU metric was used to measure the score. The resulting average score was effective because it was higher than the compared models. The score values were paralleled up to 4 grams.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Image Captioning using Hybrid LSTM-RNN with Deep Features
    Deorukhkar, Kalpana Prasanna
    Ket, Satish
    SENSING AND IMAGING, 2022, 23 (01):
  • [22] Image Captioning with Visual-Semantic LSTM
    Li, Nannan
    Chen, Zhenzhong
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 793 - 799
  • [23] Constrained LSTM and Residual Attention for Image Captioning
    Yang, Liang
    Hu, Haifeng
    Xing, Songlong
    Lu, Xinlong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (03)
  • [24] An Empirical Study of Language CNN for Image Captioning
    Gu, Jiuxiang
    Wang, Gang
    Cai, Jianfei
    Chen, Tsuhan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1231 - 1240
  • [25] Bidirectional interactive alignment network for image captioning
    Cao, Xinrong
    Yan, Peixin
    Hu, Rong
    Li, Zuoyong
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [26] Cloud-Assisted Multiview Video Summarization Using CNN and Bidirectional LSTM
    Hussain, Tanveer
    Muhammad, Khan
    Ullah, Amin
    Cao, Zehong
    Baik, Sung Wook
    de Albuquerque, Victor Hugo C.
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (01) : 77 - 86
  • [27] Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
    Dhakal, Manish
    Chhetri, Arman
    Gupta, Aman Kumar
    Lamichhane, Prabin
    Pandey, Suraj
    Shakya, Subarna
    2022 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES, ICICT 2022, 2022, : 515 - 521
  • [28] Enhancing Electrical Load Prediction Using a Bidirectional LSTM Neural Network
    Pavlatos, Christos
    Makris, Evangelos
    Fotis, Georgios
    Vita, Vasiliki
    Mladenov, Valeri
    ELECTRONICS, 2023, 12 (22)
  • [29] Enhancing EEG signals classification using LSTM-CNN architecture
    Omar, Swaleh M.
    Kimwele, Michael
    Olowolayemo, Akeem
    Kaburu, Dennis M.
    ENGINEERING REPORTS, 2024, 6 (09)
  • [30] IMAGE CAPTIONING WITH DEEP LSTM BASED ON SEQUENTIAL RESIDUAL
    Xu, Kaisheng
    Wang, Hanli
    Tang, Pengjie
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 361 - 366