Image captioning model using attention and object features to mimic human image understanding

被引:0
|
作者
Muhammad Abdelhadie Al-Malla
Assef Jafar
Nada Ghneim
机构
[1] Higher Institute for Applied Sciences and Technology,
[2] Arab International University,undefined
来源
Journal of Big Data | / 9卷
关键词
Image captioning; Object features; Convolutional neural network; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Image captioning spans the fields of computer vision and natural language processing. The image captioning task generalizes object detection where the descriptions are a single word. Recently, most research on image captioning has focused on deep learning techniques, especially Encoder-Decoder models with Convolutional Neural Network (CNN) feature extraction. However, few works have tried using object detection features to increase the quality of the generated captions. This paper presents an attention-based, Encoder-Decoder deep architecture that makes use of convolutional features extracted from a CNN model pre-trained on ImageNet (Xception), together with object features extracted from the YOLOv4 model, pre-trained on MS COCO. This paper also introduces a new positional encoding scheme for object features, the “importance factor”. Our model was tested on the MS COCO and Flickr30k datasets, and the performance is compared to performance in similar works. Our new feature extraction scheme raises the CIDEr score by 15.04%. The code is available at: https://github.com/abdelhadie-almalla/image_captioning
引用
收藏
相关论文
共 50 条
  • [1] Image captioning model using attention and object features to mimic human image understanding
    Al-Malla, Muhammad Abdelhadie
    Jafar, Assef
    Ghneim, Nada
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [2] Object Relation Attention for Image Paragraph Captioning
    Yang, Li-Chuan
    Yang, Chih-Yuan
    Hsu, Jane Yung-jen
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3136 - 3144
  • [3] Attention-Based Image Captioning Using DenseNet Features
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 109 - 117
  • [4] Feedback Attention Model for Image Captioning
    Lyu F.
    Hu F.
    Zhang Y.
    Xia Z.
    Sheng V.S.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (07): : 1122 - 1129
  • [5] Object-aware semantics of attention for image captioning
    Wang, Shiwei
    Lan, Long
    Zhang, Xiang
    Dong, Guohua
    Luo, Zhigang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (3-4) : 2013 - 2030
  • [6] Object-aware semantics of attention for image captioning
    Shiwei Wang
    Long Lan
    Xiang Zhang
    Guohua Dong
    Zhigang Luo
    Multimedia Tools and Applications, 2020, 79 : 2013 - 2030
  • [7] Boosted Attention: Leveraging Human Attention for Image Captioning
    Chen, Shi
    Zhao, Qi
    COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 : 72 - 88
  • [8] Attention on Attention for Image Captioning
    Huang, Lun
    Wang, Wenmin
    Chen, Jie
    Wei, Xiao-Yong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
  • [9] REFINING ATTENTION: A SEQUENTIAL ATTENTION MODEL FOR IMAGE CAPTIONING
    Fang, Fang
    Li, Qinyu
    Wang, Hanli
    Tang, Pengjie
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [10] Image Captioning using Visual Attention and Detection Transformer Model
    Eluri, Yaswanth
    Vinutha, N.
    Jeevika, M.
    Sree, Sai Bhavya N.
    Abhiram, G. Surya
    10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, CONECCT 2024, 2024,