Enhancing THAI Image Captioning Performance using CNN and Bidirectional LSTM

被引:0
|
作者
Tieancho, Witchaphon [1 ]
Phumeechanya, Sopon [1 ]
机构
[1] Silpakorn Univ, Fac Engn & Ind Technol, Dept Elect Engn, Nakhon Pathom, Thailand
来源
2024 21ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY, ECTI-CON 2024 | 2024年
关键词
Thai captions; convolutional neural network; bidirectional LSTM; BLEU;
D O I
10.1109/ECTI-CON60892.2024.10595011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This research has designed a deep learning model to create Thai captions using a convolutional neural network (CNN) in VGG16 format to extract image features, and it is used to procreate captions using bidirectional LSTM. The data warehouse used for training and testing is Flickr8k, which combines customized traffic-related image and caption information. For the first set of data, that is Flickr8k, all subtitles had to be translated from English to Thai using Google Translate, and ways to deal with the data before training were to remove special characters to prevent the Thai language description from being distorted. Then, to evaluate the result of the captions the model produced compared to default captions, the BLEU metric was used to measure the score. The resulting average score was effective because it was higher than the compared models. The score values were paralleled up to 4 grams.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Automatic image captioning in Thai for house defect using a deep learning-based approach
    Manadda Jaruschaimongkol
    Krittin Satirapiwong
    Kittipan Pipatsattayanuwong
    Suwant Temviriyakul
    Ratchanat Sangprasert
    Thitirat Siriborvornratanakul
    Advances in Computational Intelligence, 2024, 4 (1):
  • [42] A Comparative Study on Deep CNN Visual Encoders for Image Captioning
    Arun, M.
    Arivazhagan, S.
    Harinisri, R.
    Raghavi, P. S.
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III, 2024, 2011 : 14 - 26
  • [43] Comparative study of CNN, VGG16 with LSTM and VGG16 with Bidirectional LSTM using kitchen activity dataset
    Aparna, R.
    Chitralekha, C. K.
    Chaudhari, Shilpa
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 836 - 843
  • [44] Effective and Efficient Dimensionality Reduction of Hyperspectral Image using CNN and LSTM network
    Tulapurkar, Harshula
    Banerjee, Biplab
    Mohan, B. Krishna
    2020 IEEE INDIA GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (INGARSS), 2020, : 213 - 216
  • [45] Modelling Spatial Correlations by Using Deep CNN and LSTM for Texture Image Classification
    Yang, Mingyue
    Zhang, Jing
    Yang, Yuxiang
    Wen, Chenglin
    2018 IEEE 27TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2018, : 759 - 764
  • [46] Enhancing an Imbalanced Lung Disease X-ray Image Classification with the CNN-LSTM Model
    Fachrel, Julio
    Pravitasari, Anindya Apriliyanti
    Yulita, Intan Nurma
    Ardhisasmita, Mulya Nurmansyah
    Indrayatna, Fajar
    APPLIED SCIENCES-BASEL, 2023, 13 (14):
  • [47] Bidirectional interaction of CNN and Transformer for image inpainting
    Liu, Jialu
    Gong, Maoguo
    Gao, Yuan
    Lu, Yiheng
    Li, Hao
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [48] Enhancing Descriptive Image Captioning with Natural Language Inference
    Shi, Zhan
    Liu, Hui
    Zhu, Xiaodan
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 269 - 277
  • [49] Image Captioning Encoder–Decoder Models Using CNN-RNN Architectures: A Comparative Study
    K. Revati Suresh
    Arun Jarapala
    P. V. Sudeep
    Circuits, Systems, and Signal Processing, 2022, 41 : 5719 - 5742
  • [50] Electrocardiogram generation with a bidirectional LSTM-CNN generative adversarial network
    Fei Zhu
    Fei Ye
    Yuchen Fu
    Quan Liu
    Bairong Shen
    Scientific Reports, 9