Collaborative Learning Method for Natural Image Captioning

被引：0

作者：

Wang, Rongzhao ^{[1
]}

Liu, Libo ^{[1
]}

机构：

[1] Ningxia Univ, Sch Informat Engn, Yinchuan, Peoples R China

来源：

DATA SCIENCE (ICPCSEE 2022), PT I | 2022年 / 1628卷

关键词：

Image captioning; Pix2pix inverting; Collaborative learning;

D O I：

10.1007/978-981-19-5194-7_19

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a collaborative learning method to solve the natural image captioning problem. Numerous existing methods use pretrained image classification CNNs to obtain feature representations for image caption generation, which ignores the gap in image feature representations between different computer vision tasks. To address this problem, our method aims to utilize the similarity between image caption and pix-to-pix inverting tasks to ease the feature representation gap. Specifically, our framework consists of two modules: 1) The pix2pix module (P2PM), which has a share learning feature extractor to extract feature representations and a U-net architecture to encode the image to latent code and then decodes them to the original image. 2) The natural language generation module (NLGM) generates descriptions from feature representations extracted by P2PM. Consequently, the feature representations and generated image captions are improved during the collaborative learning process. The experimental results on the MSCOCO 2017 dataset prove the effectiveness of our approach compared to other comparison methods.

引用

页码：249 / 261

页数：13

共 50 条

[21] Deep learning-based solar image captioning
Baek, Ji-Hye
Kim, Sujin
Choi, Seonghwan
Park, Jongyeob
Kim, Dongil
ADVANCES IN SPACE RESEARCH, 2024, 73 (06) : 3270 - 3281
[22] A Two-Step Retrieval Method for Image Captioning
Pellegrin, Luis
Vanegas, Jorge A.
Arevalo, John
Beltran, Viviana
Jair Escalante, Hugo
Montes-y-Gomez, Manuel
Gonzalez, Fabio A.
EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, CLEF 2016, 2016, 9822 : 150 - 161
[23] An Image Captioning Method for Infant Sleeping Environment Diagnosis
Liu, Xinyi
Milanova, Mariofanna
MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, MPRSS 2018, 2019, 11377 : 18 - 26
[24] Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning
Wang, Qi
Yang, Zhigang
Ni, Weiping
Wu, Junzheng
Li, Qiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[25] Image Captioning using Deep Learning: A Systematic Literature Review
Chohan, Murk
Khan, Adil
Mahar, Muhammad Saleem
Hassan, Saif
Ghafoor, Abdul
Khan, Mehmood
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 278 - 286
[26] Discriminative Style Learning for Cross-Domain Image Captioning
Yuan, Jin
Zhu, Shuai
Huang, Shuyin
Zhang, Hanwang
Xiao, Yaoqiang
Li, Zhiyong
Wang, Meng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1723 - 1736
[27] Learning Double-Level Relationship Networks for image captioning
Wang, Changzhi
Gu, Xiaodong
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
[28] CONICA: A Contrastive Image Captioning Framework with Robust Similarity Learning
Deng, Lin
Zhong, Yuzhong
Wang, Maoning
Zhang, Jianwei
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5109 - 5119
[29] CASCADE ATTENTION: MULTIPLE FEATURE BASED LEARNING FOR IMAGE CAPTIONING
Shi, Jiahe
Li, Yali
Wang, Shengjin
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1970 - 1974
[30] Learning Text-to-Video Retrieval from Image Captioning
Ventura, Lucas
Schmid, Cordelia
Varol, Gul
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1834 - 1854

← 1 2 3 4 5 →