Learning to Guide Decoding for Image Captioning

被引：0

作者：

Jiang, Wenhao ^{[1
]}

Ma, Lin ^{[1
]}

Chen, Xinpeng ^{[2
]}

Zhang, Hanwang ^{[3
]}

Liu, Wei ^{[1
]}

机构：

[1] Tencent AI Lab, Bellevue, WA 98004 USA

[2] Wuhan Univ, Wuhan, Hubei, Peoples R China

[3] Nanyang Technol Univ, Singapore, Singapore

来源：

THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.

引用

页码：6959 / 6966

页数：8

共 50 条

[41] A Multi-task Learning Approach for Image Captioning
Zhao, Wei
Wang, Benyou
Ye, Jianbo
Yang, Min
Zhao, Zhou
Luo, Ruotian
Qiao, Yu
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1205 - 1211
[42] A Hybridized Deep Learning Method for Bengali Image Captioning
Humaira, Mayeesha
Paul, Shimul
Jim, Md Abidur Rahman Khan
Ami, Amit Saha
Shah, Faisal Muhammad
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 698 - 707
[43] Image Captioning using Reinforcement Learning with BLUDEr Optimization
P. R. Devi
V. Thrivikraman
D. Kashyap
S. S. Shylaja
Pattern Recognition and Image Analysis, 2020, 30 : 607 - 613
[44] Learning Cooperative Neural Modules for Stylized Image Captioning
Xinxiao Wu
Wentian Zhao
Jiebo Luo
International Journal of Computer Vision, 2022, 130 : 2305 - 2320
[45] Structural Semantic Adversarial Active Learning for Image Captioning
Zhang, Beichen
Li, Liang
Su, Li
Wang, Shuhui
Deng, Jincan
Zha, Zheng-Jun
Huang, Qingming
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1112 - 1121
[46] Learning joint relationship attention network for image captioning
Wang, Changzhi
Gu, Xiaodong
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
[47] Image Captioning Using Multimodal Deep Learning Approach
Farkh, Rihem
Oudinet, Ghislain
Foued, Yasser
Computers, Materials and Continua, 2024, 81 (03): : 3951 - 3968
[48] Deep learning-based solar image captioning
Baek, Ji-Hye
Kim, Sujin
Choi, Seonghwan
Park, Jongyeob
Kim, Dongil
ADVANCES IN SPACE RESEARCH, 2024, 73 (06) : 3270 - 3281
[49] Enhancing Image Captioning with Transformer-Based Two-Pass Decoding Framework
Su, Jindian
Mou, Yueqi
Xie, Yunhao
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024, 2024, 14875 : 171 - 183
[50] Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects
Yao, Ting
Pan, Yingwei
Li, Yehao
Mei, Tao
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5263 - 5271

← 1 2 3 4 5 →