Learning to Guide Decoding for Image Captioning

被引:0
|
作者
Jiang, Wenhao [1 ]
Ma, Lin [1 ]
Chen, Xinpeng [2 ]
Zhang, Hanwang [3 ]
Liu, Wei [1 ]
机构
[1] Tencent AI Lab, Bellevue, WA 98004 USA
[2] Wuhan Univ, Wuhan, Hubei, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
来源
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.
引用
收藏
页码:6959 / 6966
页数:8
相关论文
共 50 条
  • [21] Image Captioning with Partially Rewarded Imitation Learning
    Yu, Xintong
    Guo, Tszhang
    Fu, Kun
    Li, Lei
    Zhang, Changshui
    Zhang, Jianwei
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [22] Facilitated Deep Learning Models for Image Captioning
    Azhar, Imtinan
    Afyouni, Imad
    Elnagar, Ashraf
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [23] CaMEL: Mean Teacher Learning for Image Captioning
    Barraco, Manuele
    Stefanini, Matteo
    Cornia, Marcella
    Cascianelli, Silvia
    Baraldi, Lorenzo
    Cucchiara, Rita
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4087 - 4094
  • [24] Neural Symbolic Representation Learning for Image Captioning
    Wang, Xiaomei
    Ma, Lin
    Fu, Yanwei
    Xue, Xiangyang
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 312 - 321
  • [25] Collaborative Learning Method for Natural Image Captioning
    Wang, Rongzhao
    Liu, Libo
    DATA SCIENCE (ICPCSEE 2022), PT I, 2022, 1628 : 249 - 261
  • [26] A TextGCN-Based Decoding Approach for Improving Remote Sensing Image Captioning
    Das, Swadhin
    Sharma, Raksha
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
  • [27] SCAP: enhancing image captioning through lightweight feature sifting and hierarchical decoding
    Zhang, Yuhao
    Tong, Jiaqi
    Liu, Honglin
    VISUAL COMPUTER, 2025,
  • [28] Image and Video Captioning for Apparels Using Deep Learning
    Agarwal, Govind
    Jindal, Kritika
    Chowdhury, Abishi
    Singh, Vishal K.
    Pal, Amrit
    IEEE ACCESS, 2024, 12 : 113138 - 113150
  • [29] Learning Combinatorial Prompts for Universal Controllable Image Captioning
    Wang, Zhen
    Xiao, Jun
    Zhuang, Yueting
    Gao, Fei
    Shao, Jian
    Chen, Long
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 129 - 150
  • [30] Reinforcement Learning Transformer for Image Captioning Generation Model
    Yan, Zhaojie
    FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701