Collaborative Learning Method for Natural Image Captioning

被引:0
|
作者
Wang, Rongzhao [1 ]
Liu, Libo [1 ]
机构
[1] Ningxia Univ, Sch Informat Engn, Yinchuan, Peoples R China
来源
DATA SCIENCE (ICPCSEE 2022), PT I | 2022年 / 1628卷
关键词
Image captioning; Pix2pix inverting; Collaborative learning;
D O I
10.1007/978-981-19-5194-7_19
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a collaborative learning method to solve the natural image captioning problem. Numerous existing methods use pretrained image classification CNNs to obtain feature representations for image caption generation, which ignores the gap in image feature representations between different computer vision tasks. To address this problem, our method aims to utilize the similarity between image caption and pix-to-pix inverting tasks to ease the feature representation gap. Specifically, our framework consists of two modules: 1) The pix2pix module (P2PM), which has a share learning feature extractor to extract feature representations and a U-net architecture to encode the image to latent code and then decodes them to the original image. 2) The natural language generation module (NLGM) generates descriptions from feature representations extracted by P2PM. Consequently, the feature representations and generated image captions are improved during the collaborative learning process. The experimental results on the MSCOCO 2017 dataset prove the effectiveness of our approach compared to other comparison methods.
引用
收藏
页码:249 / 261
页数:13
相关论文
共 50 条
  • [21] Deep learning-based solar image captioning
    Baek, Ji-Hye
    Kim, Sujin
    Choi, Seonghwan
    Park, Jongyeob
    Kim, Dongil
    ADVANCES IN SPACE RESEARCH, 2024, 73 (06) : 3270 - 3281
  • [22] A Two-Step Retrieval Method for Image Captioning
    Pellegrin, Luis
    Vanegas, Jorge A.
    Arevalo, John
    Beltran, Viviana
    Jair Escalante, Hugo
    Montes-y-Gomez, Manuel
    Gonzalez, Fabio A.
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, CLEF 2016, 2016, 9822 : 150 - 161
  • [23] An Image Captioning Method for Infant Sleeping Environment Diagnosis
    Liu, Xinyi
    Milanova, Mariofanna
    MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, MPRSS 2018, 2019, 11377 : 18 - 26
  • [24] Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning
    Wang, Qi
    Yang, Zhigang
    Ni, Weiping
    Wu, Junzheng
    Li, Qiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [25] Image Captioning using Deep Learning: A Systematic Literature Review
    Chohan, Murk
    Khan, Adil
    Mahar, Muhammad Saleem
    Hassan, Saif
    Ghafoor, Abdul
    Khan, Mehmood
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 278 - 286
  • [26] Discriminative Style Learning for Cross-Domain Image Captioning
    Yuan, Jin
    Zhu, Shuai
    Huang, Shuyin
    Zhang, Hanwang
    Xiao, Yaoqiang
    Li, Zhiyong
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1723 - 1736
  • [27] Learning Double-Level Relationship Networks for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [28] CONICA: A Contrastive Image Captioning Framework with Robust Similarity Learning
    Deng, Lin
    Zhong, Yuzhong
    Wang, Maoning
    Zhang, Jianwei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5109 - 5119
  • [29] CASCADE ATTENTION: MULTIPLE FEATURE BASED LEARNING FOR IMAGE CAPTIONING
    Shi, Jiahe
    Li, Yali
    Wang, Shengjin
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1970 - 1974
  • [30] Learning Text-to-Video Retrieval from Image Captioning
    Ventura, Lucas
    Schmid, Cordelia
    Varol, Gul
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1834 - 1854