Learning to Guide Decoding for Image Captioning

被引：0

作者：

Jiang, Wenhao ^{[1
]}

Ma, Lin ^{[1
]}

Chen, Xinpeng ^{[2
]}

Zhang, Hanwang ^{[3
]}

Liu, Wei ^{[1
]}

机构：

[1] Tencent AI Lab, Bellevue, WA 98004 USA

[2] Wuhan Univ, Wuhan, Hubei, Peoples R China

[3] Nanyang Technol Univ, Singapore, Singapore

来源：

THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.

引用

页码：6959 / 6966

页数：8

共 50 条

[21] Image Captioning with Partially Rewarded Imitation Learning
Yu, Xintong
Guo, Tszhang
Fu, Kun
Li, Lei
Zhang, Changshui
Zhang, Jianwei
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[22] Facilitated Deep Learning Models for Image Captioning
Azhar, Imtinan
Afyouni, Imad
Elnagar, Ashraf
2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[23] CaMEL: Mean Teacher Learning for Image Captioning
Barraco, Manuele
Stefanini, Matteo
Cornia, Marcella
Cascianelli, Silvia
Baraldi, Lorenzo
Cucchiara, Rita
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4087 - 4094
[24] Neural Symbolic Representation Learning for Image Captioning
Wang, Xiaomei
Ma, Lin
Fu, Yanwei
Xue, Xiangyang
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 312 - 321
[25] Collaborative Learning Method for Natural Image Captioning
Wang, Rongzhao
Liu, Libo
DATA SCIENCE (ICPCSEE 2022), PT I, 2022, 1628 : 249 - 261
[26] A TextGCN-Based Decoding Approach for Improving Remote Sensing Image Captioning
Das, Swadhin
Sharma, Raksha
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
[27] SCAP: enhancing image captioning through lightweight feature sifting and hierarchical decoding
Zhang, Yuhao
Tong, Jiaqi
Liu, Honglin
VISUAL COMPUTER, 2025,
[28] Image and Video Captioning for Apparels Using Deep Learning
Agarwal, Govind
Jindal, Kritika
Chowdhury, Abishi
Singh, Vishal K.
Pal, Amrit
IEEE ACCESS, 2024, 12 : 113138 - 113150
[29] Learning Combinatorial Prompts for Universal Controllable Image Captioning
Wang, Zhen
Xiao, Jun
Zhuang, Yueting
Gao, Fei
Shao, Jian
Chen, Long
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 129 - 150
[30] Reinforcement Learning Transformer for Image Captioning Generation Model
Yan, Zhaojie
FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701

← 1 2 3 4 5 →