Collaborative Learning Method for Natural Image Captioning

被引：0

作者：

Wang, Rongzhao ^{[1
]}

Liu, Libo ^{[1
]}

机构：

[1] Ningxia Univ, Sch Informat Engn, Yinchuan, Peoples R China

来源：

DATA SCIENCE (ICPCSEE 2022), PT I | 2022年 / 1628卷

关键词：

Image captioning; Pix2pix inverting; Collaborative learning;

D O I：

10.1007/978-981-19-5194-7_19

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a collaborative learning method to solve the natural image captioning problem. Numerous existing methods use pretrained image classification CNNs to obtain feature representations for image caption generation, which ignores the gap in image feature representations between different computer vision tasks. To address this problem, our method aims to utilize the similarity between image caption and pix-to-pix inverting tasks to ease the feature representation gap. Specifically, our framework consists of two modules: 1) The pix2pix module (P2PM), which has a share learning feature extractor to extract feature representations and a U-net architecture to encode the image to latent code and then decodes them to the original image. 2) The natural language generation module (NLGM) generates descriptions from feature representations extracted by P2PM. Consequently, the feature representations and generated image captions are improved during the collaborative learning process. The experimental results on the MSCOCO 2017 dataset prove the effectiveness of our approach compared to other comparison methods.

引用

页码：249 / 261

页数：13

共 50 条

[41] COLLOQUIAL IMAGE CAPTIONING
Ge, Xuri
Chen, Fuhai
Shen, Chen
Ji, Rongrong
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 356 - 361
[42] Automated image captioning
Puscasiu, Adela
Fanca, Alexandra
Gota, Dan-Ioan
Valean, Honoriu
PROCEEDINGS OF 2020 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION, QUALITY AND TESTING, ROBOTICS (AQTR), 2020, : 361 - 366
[43] Balanced image captioning with task-aware decoupled learning and fusion
Ding, Yuxuan
Liu, Lingqiao
Tian, Chunna
Zhang, Xiangnan
Tian, Xilan
NEUROCOMPUTING, 2023, 538
[44] DrunaliaCap: Image Captioning for Drug-Related Paraphernalia With Deep Learning
Zhao, Beigeng
IEEE ACCESS, 2020, 8 : 161326 - 161336
[45] Learning visual relationship and context-aware attention for image captioning
Wang, Junbo
Wang, Wei
Wang, Liang
Wang, Zhiyong
Feng, David Dagan
Tan, Tieniu
PATTERN RECOGNITION, 2020, 98
[46] Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning
Wang, Cheng
Yang, Haojin
Meinel, Christoph
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (02)
[47] Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Yang, Xu
Zhang, Hanwang
Gao, Chongyang
Cai, Jianfei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (01) : 82 - 100
[48] ArCo: Attention-reinforced transformer with contrastive learning for image captioning
Wang, Zhongan
Shi, Shuai
Zhai, Zirong
Wu, Yingna
Yang, Rui
IMAGE AND VISION COMPUTING, 2022, 128
[49] A Novel Technique for Image Captioning Based on Hierarchical Clustering and Deep Learning
Rizwan Ur Rahman
Pavan Kumar
Aditya Mohan
Rabia Musheer Aziz
Deepak Singh Tomar
SN Computer Science, 6 (4)
[50] Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
Dong, Xinzhi
Long, Chengjiang
Xu, Wenju
Xiao, Chunxia
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2615 - 2624

← 1 2 3 4 5 →