Learning to Guide Decoding for Image Captioning

被引：0

作者：

Jiang, Wenhao ^{[1
]}

Ma, Lin ^{[1
]}

Chen, Xinpeng ^{[2
]}

Zhang, Hanwang ^{[3
]}

Liu, Wei ^{[1
]}

机构：

[1] Tencent AI Lab, Bellevue, WA 98004 USA

[2] Wuhan Univ, Wuhan, Hubei, Peoples R China

[3] Nanyang Technol Univ, Singapore, Singapore

来源：

THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.

引用

页码：6959 / 6966

页数：8

共 50 条

[1] Reflective Decoding Network for Image Captioning
Ke, Lei
Pei, Wenjie
Li, Ruiyu
Shen, Xiaoyong
Tai, Yu-Wing
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8887 - 8896
[2] Hierarchical decoding with latent context for image captioning
Jing Zhang
Yingshuai Xie
Kangkang Li
Zhe Wang
Wen Du
Neural Computing and Applications, 2023, 35 : 2429 - 2442
[3] Hierarchical decoding with latent context for image captioning
Zhang, Jing
Xie, Yingshuai
Li, Kangkang
Wang, Zhe
Du, Wen
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (03): : 2429 - 2442
[4] Show, Tell, and Polish: Ruminant Decoding for Image Captioning
Guo, Longteng
Liu, Jing
Lu, Shichen
Lu, Hanqing
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (08) : 2149 - 2162
[5] Contrastive Learning for Image Captioning
Dai, Bo
Lin, Dahua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[6] Learning to Evaluate Image Captioning
Cui, Yin
Yang, Guandao
Veit, Andreas
Huang, Xun
Belongie, Serge
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5804 - 5812
[7] Meta Learning for Image Captioning
Li, Nannan
Chen, Zhenzhong
Liu, Shan
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8626 - 8633
[8] REMOTE SENSING IMAGE CAPTIONING WITH SVM-BASED DECODING
Hoxha, Genc
Melgani, Farid
IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 6734 - 6737
[9] Deep Learning for Military Image Captioning
Das, Subrata
Jain, Lalit
Das, Amp
2018 21ST INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2018, : 2165 - 2171
[10] Image Captioning using Deep Learning
Jain, Yukti Sanjay
Dhopeshwar, Tanisha
Chadha, Supreet Kaur
Pagire, Vrushali
2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,

← 1 2 3 4 5 →