Collaborative Learning Method for Natural Image Captioning

被引：0

作者：

Wang, Rongzhao ^{[1
]}

Liu, Libo ^{[1
]}

机构：

[1] Ningxia Univ, Sch Informat Engn, Yinchuan, Peoples R China

来源：

DATA SCIENCE (ICPCSEE 2022), PT I | 2022年 / 1628卷

关键词：

Image captioning; Pix2pix inverting; Collaborative learning;

D O I：

10.1007/978-981-19-5194-7_19

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a collaborative learning method to solve the natural image captioning problem. Numerous existing methods use pretrained image classification CNNs to obtain feature representations for image caption generation, which ignores the gap in image feature representations between different computer vision tasks. To address this problem, our method aims to utilize the similarity between image caption and pix-to-pix inverting tasks to ease the feature representation gap. Specifically, our framework consists of two modules: 1) The pix2pix module (P2PM), which has a share learning feature extractor to extract feature representations and a U-net architecture to encode the image to latent code and then decodes them to the original image. 2) The natural language generation module (NLGM) generates descriptions from feature representations extracted by P2PM. Consequently, the feature representations and generated image captions are improved during the collaborative learning process. The experimental results on the MSCOCO 2017 dataset prove the effectiveness of our approach compared to other comparison methods.

引用

页码：249 / 261

页数：13

共 50 条

[1] Deep Learning for automatically describing images in natural language - Image Captioning
Hotaran, Anca Mihaela
Vrejoiu, Mihnea Horia
ROMANIAN JOURNAL OF INFORMATION TECHNOLOGY AND AUTOMATIC CONTROL-REVISTA ROMANA DE INFORMATICA SI AUTOMATICA, 2020, 30 (01): : 87 - 100
[2] Collaborative strategy network for spatial attention image captioning
Dongming Zhou
Jing Yang
Riqiang Bao
Applied Intelligence, 2022, 52 : 9017 - 9032
[3] Collaborative strategy network for spatial attention image captioning
Zhou, Dongming
Yang, Jing
Bao, Riqiang
APPLIED INTELLIGENCE, 2022, 52 (08) : 9017 - 9032
[4] Dual-visual collaborative enhanced transformer for image captioning
Mou, Zhenping
Song, Tianqi
Luo, Hong
MULTIMEDIA SYSTEMS, 2025, 31 (02)
[5] Learning Image Captioning as a Structured Transduction Task
Bacciu, Davide
Serramazza, Davide
ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022, 2022, 1600 : 235 - 246
[6] Deep Learning Approaches on Image Captioning: A Review
Ghandi, Taraneh
Pourreza, Hamidreza
Mahyar, Hamidreza
ACM COMPUTING SURVEYS, 2024, 56 (03)
[7] A Comprehensive Survey of Deep Learning for Image Captioning
Hossain, Md Zakir
Sohel, Ferdous
Shiratuddin, Mohd Fairuz
Laga, Hamid
ACM COMPUTING SURVEYS, 2019, 51 (06)
[8] Facilitated Deep Learning Models for Image Captioning
Azhar, Imtinan
Afyouni, Imad
Elnagar, Ashraf
2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[9] Neural Symbolic Representation Learning for Image Captioning
Wang, Xiaomei
Ma, Lin
Fu, Yanwei
Xue, Xiangyang
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 312 - 321
[10] Natural Language Processing with Optimal Deep Learning-Enabled Intelligent Image Captioning System
Marzouk, Radwa
Alabdulkreem, Eatedal
Nour, Mohamed K.
Al Duhayyim, Mesfer
Othman, Mahmoud
Zamani, Abu Sarwar
Yaseen, Ishfaq
Motwakel, Abdelwahed
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 4435 - 4451

← 1 2 3 4 5 →