Learning to Guide Decoding for Image Captioning

被引:0
|
作者
Jiang, Wenhao [1 ]
Ma, Lin [1 ]
Chen, Xinpeng [2 ]
Zhang, Hanwang [3 ]
Liu, Wei [1 ]
机构
[1] Tencent AI Lab, Bellevue, WA 98004 USA
[2] Wuhan Univ, Wuhan, Hubei, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.
引用
收藏
页码:6959 / 6966
页数:8
相关论文
共 50 条
  • [41] A Multi-task Learning Approach for Image Captioning
    Zhao, Wei
    Wang, Benyou
    Ye, Jianbo
    Yang, Min
    Zhao, Zhou
    Luo, Ruotian
    Qiao, Yu
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1205 - 1211
  • [42] A Hybridized Deep Learning Method for Bengali Image Captioning
    Humaira, Mayeesha
    Paul, Shimul
    Jim, Md Abidur Rahman Khan
    Ami, Amit Saha
    Shah, Faisal Muhammad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 698 - 707
  • [43] Image Captioning using Reinforcement Learning with BLUDEr Optimization
    P. R. Devi
    V. Thrivikraman
    D. Kashyap
    S. S. Shylaja
    Pattern Recognition and Image Analysis, 2020, 30 : 607 - 613
  • [44] Learning Cooperative Neural Modules for Stylized Image Captioning
    Xinxiao Wu
    Wentian Zhao
    Jiebo Luo
    International Journal of Computer Vision, 2022, 130 : 2305 - 2320
  • [45] Structural Semantic Adversarial Active Learning for Image Captioning
    Zhang, Beichen
    Li, Liang
    Su, Li
    Wang, Shuhui
    Deng, Jincan
    Zha, Zheng-Jun
    Huang, Qingming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1112 - 1121
  • [46] Learning joint relationship attention network for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
  • [47] Image Captioning Using Multimodal Deep Learning Approach
    Farkh, Rihem
    Oudinet, Ghislain
    Foued, Yasser
    Computers, Materials and Continua, 2024, 81 (03): : 3951 - 3968
  • [48] Deep learning-based solar image captioning
    Baek, Ji-Hye
    Kim, Sujin
    Choi, Seonghwan
    Park, Jongyeob
    Kim, Dongil
    ADVANCES IN SPACE RESEARCH, 2024, 73 (06) : 3270 - 3281
  • [49] Enhancing Image Captioning with Transformer-Based Two-Pass Decoding Framework
    Su, Jindian
    Mou, Yueqi
    Xie, Yunhao
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024, 2024, 14875 : 171 - 183
  • [50] Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects
    Yao, Ting
    Pan, Yingwei
    Li, Yehao
    Mei, Tao
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5263 - 5271