Image captioning model using attention and object features to mimic human image understanding

被引：0

作者：

Muhammad Abdelhadie Al-Malla

Assef Jafar

Nada Ghneim

机构：

[1] Higher Institute for Applied Sciences and Technology,

[2] Arab International University,undefined

来源：

Journal of Big Data | / 9卷

关键词：

Image captioning; Object features; Convolutional neural network; Deep learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Image captioning spans the fields of computer vision and natural language processing. The image captioning task generalizes object detection where the descriptions are a single word. Recently, most research on image captioning has focused on deep learning techniques, especially Encoder-Decoder models with Convolutional Neural Network (CNN) feature extraction. However, few works have tried using object detection features to increase the quality of the generated captions. This paper presents an attention-based, Encoder-Decoder deep architecture that makes use of convolutional features extracted from a CNN model pre-trained on ImageNet (Xception), together with object features extracted from the YOLOv4 model, pre-trained on MS COCO. This paper also introduces a new positional encoding scheme for object features, the “importance factor”. Our model was tested on the MS COCO and Flickr30k datasets, and the performance is compared to performance in similar works. Our new feature extraction scheme raises the CIDEr score by 15.04%. The code is available at: https://github.com/abdelhadie-almalla/image_captioning

引用

共 50 条

[1] Image captioning model using attention and object features to mimic human image understanding
Al-Malla, Muhammad Abdelhadie
Jafar, Assef
Ghneim, Nada
JOURNAL OF BIG DATA, 2022, 9 (01)
[2] Object Relation Attention for Image Paragraph Captioning
Yang, Li-Chuan
Yang, Chih-Yuan
Hsu, Jane Yung-jen
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3136 - 3144
[3] Attention-Based Image Captioning Using DenseNet Features
Hossain, Md Zakir
Sohel, Ferdous
Shiratuddin, Mohd Fairuz
Laga, Hamid
Bennamoun, Mohammed
NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 109 - 117
[4] Feedback Attention Model for Image Captioning
Lyu F.
Hu F.
Zhang Y.
Xia Z.
Sheng V.S.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (07): : 1122 - 1129
[5] Object-aware semantics of attention for image captioning
Wang, Shiwei
Lan, Long
Zhang, Xiang
Dong, Guohua
Luo, Zhigang
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (3-4) : 2013 - 2030
[6] Object-aware semantics of attention for image captioning
Shiwei Wang
Long Lan
Xiang Zhang
Guohua Dong
Zhigang Luo
Multimedia Tools and Applications, 2020, 79 : 2013 - 2030
[7] Boosted Attention: Leveraging Human Attention for Image Captioning
Chen, Shi
Zhao, Qi
COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 : 72 - 88
[8] Attention on Attention for Image Captioning
Huang, Lun
Wang, Wenmin
Chen, Jie
Wei, Xiao-Yong
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
[9] REFINING ATTENTION: A SEQUENTIAL ATTENTION MODEL FOR IMAGE CAPTIONING
Fang, Fang
Li, Qinyu
Wang, Hanli
Tang, Pengjie
2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
[10] Image Captioning using Visual Attention and Detection Transformer Model
Eluri, Yaswanth
Vinutha, N.
Jeevika, M.
Sree, Sai Bhavya N.
Abhiram, G. Surya
10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, CONECCT 2024, 2024,

← 1 2 3 4 5 →