Learning to Guide Decoding for Image Captioning

被引:0
|
作者
Jiang, Wenhao [1 ]
Ma, Lin [1 ]
Chen, Xinpeng [2 ]
Zhang, Hanwang [3 ]
Liu, Wei [1 ]
机构
[1] Tencent AI Lab, Bellevue, WA 98004 USA
[2] Wuhan Univ, Wuhan, Hubei, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.
引用
收藏
页码:6959 / 6966
页数:8
相关论文
共 50 条
  • [1] Reflective Decoding Network for Image Captioning
    Ke, Lei
    Pei, Wenjie
    Li, Ruiyu
    Shen, Xiaoyong
    Tai, Yu-Wing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8887 - 8896
  • [2] Hierarchical decoding with latent context for image captioning
    Jing Zhang
    Yingshuai Xie
    Kangkang Li
    Zhe Wang
    Wen Du
    Neural Computing and Applications, 2023, 35 : 2429 - 2442
  • [3] Hierarchical decoding with latent context for image captioning
    Zhang, Jing
    Xie, Yingshuai
    Li, Kangkang
    Wang, Zhe
    Du, Wen
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (03): : 2429 - 2442
  • [4] Show, Tell, and Polish: Ruminant Decoding for Image Captioning
    Guo, Longteng
    Liu, Jing
    Lu, Shichen
    Lu, Hanqing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (08) : 2149 - 2162
  • [5] Contrastive Learning for Image Captioning
    Dai, Bo
    Lin, Dahua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [6] Learning to Evaluate Image Captioning
    Cui, Yin
    Yang, Guandao
    Veit, Andreas
    Huang, Xun
    Belongie, Serge
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5804 - 5812
  • [7] Meta Learning for Image Captioning
    Li, Nannan
    Chen, Zhenzhong
    Liu, Shan
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8626 - 8633
  • [8] REMOTE SENSING IMAGE CAPTIONING WITH SVM-BASED DECODING
    Hoxha, Genc
    Melgani, Farid
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 6734 - 6737
  • [9] Deep Learning for Military Image Captioning
    Das, Subrata
    Jain, Lalit
    Das, Amp
    2018 21ST INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2018, : 2165 - 2171
  • [10] Image Captioning using Deep Learning
    Jain, Yukti Sanjay
    Dhopeshwar, Tanisha
    Chadha, Supreet Kaur
    Pagire, Vrushali
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,