Learning to Guide Decoding for Image Captioning

被引:0
|
作者
Jiang, Wenhao [1 ]
Ma, Lin [1 ]
Chen, Xinpeng [2 ]
Zhang, Hanwang [3 ]
Liu, Wei [1 ]
机构
[1] Tencent AI Lab, Bellevue, WA 98004 USA
[2] Wuhan Univ, Wuhan, Hubei, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.
引用
收藏
页码:6959 / 6966
页数:8
相关论文
共 50 条
  • [31] Prompt-Based Learning for Unpaired Image Captioning
    Zhu, Peipei
    Wang, Xiao
    Zhu, Lin
    Sun, Zhenglong
    Zheng, Wei-Shi
    Wang, Yaowei
    Chen, Changwen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 379 - 393
  • [32] Image Captioning using Reinforcement Learning with BLUDEr Optimization
    Devi, P. R.
    Thrivikraman, V
    Kashyap, D.
    Shylaja, S. S.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2020, 30 (04) : 607 - 613
  • [33] High-Order Interaction Learning for Image Captioning
    Wang, Yanhui
    Xu, Ning
    Liu, An-An
    Li, Wenhui
    Zhang, Yongdong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4417 - 4430
  • [34] Contrastive semantic similarity learning for image captioning evaluation
    Zeng, Chao
    Kwong, Sam
    Zhao, Tiesong
    Wang, Hanli
    INFORMATION SCIENCES, 2022, 609 : 913 - 930
  • [35] Image Change Captioning by Learning from an Auxiliary Task
    Hosseinzadeh, Mehrdad
    Wang, Yang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2724 - 2733
  • [36] Generative image captioning in Urdu using deep learning
    Afzal M.K.
    Shardlow M.
    Tuarob S.
    Zaman F.
    Sarwar R.
    Ali M.
    Aljohani N.R.
    Lytras M.D.
    Nawaz R.
    Hassan S.-U.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (06) : 7719 - 7731
  • [37] Dual Learning for Cross-domain Image Captioning
    Zhao, Wei
    Xu, Wei
    Yang, Min
    Ye, Jianbo
    Zhao, Zhou
    Feng, Yabing
    Qiao, Yu
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 29 - 38
  • [38] Learning Cooperative Neural Modules for Stylized Image Captioning
    Wu, Xinxiao
    Zhao, Wentian
    Luo, Jiebo
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2305 - 2320
  • [39] Multitask Learning for Cross-Domain Image Captioning
    Yang, Min
    Zhao, Wei
    Xu, Wei
    Feng, Yabing
    Zhao, Zhou
    Chen, Xiaojun
    Lei, Kai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 1047 - 1061
  • [40] Image Captioning using Adversarial Networks and Reinforcement Learning
    Yan, Shiyang
    Wu, Fangyu
    Smith, Jeremy S.
    Lu, Wenjin
    Zhang, Bailing
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 248 - 253