Collaborative Learning Method for Natural Image Captioning

被引:0
|
作者
Wang, Rongzhao [1 ]
Liu, Libo [1 ]
机构
[1] Ningxia Univ, Sch Informat Engn, Yinchuan, Peoples R China
来源
DATA SCIENCE (ICPCSEE 2022), PT I | 2022年 / 1628卷
关键词
Image captioning; Pix2pix inverting; Collaborative learning;
D O I
10.1007/978-981-19-5194-7_19
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a collaborative learning method to solve the natural image captioning problem. Numerous existing methods use pretrained image classification CNNs to obtain feature representations for image caption generation, which ignores the gap in image feature representations between different computer vision tasks. To address this problem, our method aims to utilize the similarity between image caption and pix-to-pix inverting tasks to ease the feature representation gap. Specifically, our framework consists of two modules: 1) The pix2pix module (P2PM), which has a share learning feature extractor to extract feature representations and a U-net architecture to encode the image to latent code and then decodes them to the original image. 2) The natural language generation module (NLGM) generates descriptions from feature representations extracted by P2PM. Consequently, the feature representations and generated image captions are improved during the collaborative learning process. The experimental results on the MSCOCO 2017 dataset prove the effectiveness of our approach compared to other comparison methods.
引用
收藏
页码:249 / 261
页数:13
相关论文
共 50 条
  • [41] COLLOQUIAL IMAGE CAPTIONING
    Ge, Xuri
    Chen, Fuhai
    Shen, Chen
    Ji, Rongrong
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 356 - 361
  • [42] Automated image captioning
    Puscasiu, Adela
    Fanca, Alexandra
    Gota, Dan-Ioan
    Valean, Honoriu
    PROCEEDINGS OF 2020 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION, QUALITY AND TESTING, ROBOTICS (AQTR), 2020, : 361 - 366
  • [43] Balanced image captioning with task-aware decoupled learning and fusion
    Ding, Yuxuan
    Liu, Lingqiao
    Tian, Chunna
    Zhang, Xiangnan
    Tian, Xilan
    NEUROCOMPUTING, 2023, 538
  • [44] DrunaliaCap: Image Captioning for Drug-Related Paraphernalia With Deep Learning
    Zhao, Beigeng
    IEEE ACCESS, 2020, 8 : 161326 - 161336
  • [45] Learning visual relationship and context-aware attention for image captioning
    Wang, Junbo
    Wang, Wei
    Wang, Liang
    Wang, Zhiyong
    Feng, David Dagan
    Tan, Tieniu
    PATTERN RECOGNITION, 2020, 98
  • [46] Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (02)
  • [47] Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
    Yang, Xu
    Zhang, Hanwang
    Gao, Chongyang
    Cai, Jianfei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (01) : 82 - 100
  • [48] ArCo: Attention-reinforced transformer with contrastive learning for image captioning
    Wang, Zhongan
    Shi, Shuai
    Zhai, Zirong
    Wu, Yingna
    Yang, Rui
    IMAGE AND VISION COMPUTING, 2022, 128
  • [49] A Novel Technique for Image Captioning Based on Hierarchical Clustering and Deep Learning
    Rizwan Ur Rahman
    Pavan Kumar
    Aditya Mohan
    Rabia Musheer Aziz
    Deepak Singh Tomar
    SN Computer Science, 6 (4)
  • [50] Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
    Dong, Xinzhi
    Long, Chengjiang
    Xu, Wenju
    Xiao, Chunxia
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2615 - 2624