Improving Image Captioning with Conditional Generative Adversarial Nets

被引:0
|
作者
Chen, Chen [1 ]
Mu, Shuai [1 ]
Xiao, Wanpeng [1 ]
Ye, Zexiong [1 ]
Wu, Liesi [1 ]
Ju, Qi [1 ]
机构
[1] Tencent AI Lab, Shenzhen 518000, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel conditional-generativeadversarial-nets-based image captioning framework as an extension of traditional reinforcement-learning (RL)-based encoder-decoder architecture. To deal with the inconsistent evaluation problem among different objective language metrics, we are motivated to design some "discriminator" networks to automatically and progressively determine whether generated caption is human described or machine generated. Two kinds of discriminator architectures (CNN and RNN-based structures) are introduced since each has its own advantages. The proposed algorithm is generic so that it can enhance any existing RL-based image captioning framework and we show that the conventional RL training method is just a special case of our approach. Empirically, we show consistent improvements over all language evaluation metrics for different state-of-the-art image captioning models. In addition, the well-trained discriminators can also be viewed as objective image captioning evaluators.
引用
收藏
页码:8142 / 8150
页数:9
相关论文
共 50 条
  • [1] Image Captioning Based on Conditional Generative Adversarial Nets
    Huang Y.
    Bai C.
    Li H.
    Zhang J.
    Chen S.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (06): : 911 - 918
  • [2] TRIPLE SEQUENCE GENERATIVE ADVERSARIAL NETS FOR UNSUPERVISED IMAGE CAPTIONING
    Zhou, Yucheng
    Tao, Wei
    Zhang, Wenqiang
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7598 - 7602
  • [3] IMAGE EDGE DETECTION BASE ON CONDITIONAL GENERATIVE ADVERSARIAL NETS
    He, Mingyun
    Wu, Yulun
    Li, Xiaofang
    Liu, Jinyi
    Gu, Xiaofeng
    2018 15TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2018, : 18 - 21
  • [4] Image Captioning with Generative Adversarial Network
    Amirian, Soheyla
    Rasheed, Khaled
    Taha, Thiab R.
    Arabnia, Hamid R.
    2019 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2019), 2019, : 272 - 275
  • [5] Sequence generative adversarial nets with a conditional discriminator
    Yan, Yongfei
    Shen, Gehui
    Zhang, Song
    Huang, Ting
    Deng, Zhi-Hong
    Yun, Unil
    NEUROCOMPUTING, 2021, 429 : 69 - 76
  • [6] Visible-to-infrared Image Translation Based on an Improved Conditional Generative Adversarial Nets
    Ma Decao
    Xian Yong
    Su Juan
    Li Shaopeng
    Li Bing
    ACTA PHOTONICA SINICA, 2023, 52 (04)
  • [7] Medical Image Segmentation Using Semi-supervised Conditional Generative Adversarial Nets
    Liu S.-P.
    Hong J.-M.
    Liang J.-P.
    Jia X.-P.
    Ouyang J.
    Yin J.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (08): : 2588 - 2602
  • [8] Interactive Dual Generative Adversarial Networks for Image Captioning
    Liu, Junhao
    Wang, Kai
    Xu, Chunpu
    Zhao, Zhou
    Xu, Ruifeng
    Shen, Ying
    Yang, Min
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11588 - 11595
  • [9] Conditional Generative Adversarial Nets Classifier for Spoken Language Identification
    Shen, Peng
    Lu, Xugang
    Li, Sheng
    Kawai, Hisashi
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2814 - 2818
  • [10] A Road Extraction Method Based on Conditional Generative Adversarial Nets
    Lu, Chuanwei
    Sun, Qun
    Zhao, Yunpeng
    Sun, Shijie
    Ma, Jingzhen
    Cheng, Mianmian
    Li, Yuanfu
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2021, 46 (06): : 807 - 815