Improved image captioning with subword units training and transformer

被引:1
作者
蔡强
Li Jing
Li Haisheng
Zuo Min
机构
[1] School of Computer and Information Engineering,Beijing Techology and Business University
[2] Beijing Key Laboratory of Big Data Technology for Food Safety
[3] National Engineering Laboratory for Agri-Product Quality Traceability
关键词
image captioning; transformer; byte pair encoding(BPE); reinforcement learning;
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
Image captioning models typically operate with a fixed vocabulary, but captioning is an open-vocabulary problem. Existing work addresses the image captioning of out-of-vocabulary words by labeling it as unknown in a dictionary. In addition, recurrent neural network(RNN) and its variants used in the caption task have become a bottleneck for their generation quality and training time cost. To address these 2 essential problems, a simpler but more effective approach is proposed for generating open-vocabulary caption, long short-term memory(LSTM) unit is replaced with transformer as decoder for better caption quality and less training time. The effectiveness of different word segmentation vocabulary and generation improvement of transformer over LSTM is discussed and it is proved that the improved models achieve state-of-the-art performance for the MSCOCO2014 image captioning tasks over a back-off dictionary baseline model.
引用
收藏
页码:211 / 216
页数:6
相关论文
共 3 条
[1]  
A Novel Framework for Semantic Segmentation with Generative Adversarial Network[J] . Xiaobin Zhu,Xinming Zhang,Xiao-Yu Zhang,Ziyu Xue,Lei Wang.Journal of Visual Communication and Image Represe . 2018
[2]  
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J] . Ranjay Krishna,Yuke Zhu,Oliver Groth,Justin Johnson,Kenji Hata,Joshua Kravitz,Stephanie Chen,Yannis Kalantidis,Li-Jia Li,David A. Shamma,Michael S. Bernstein,Li Fei-Fei.International Journal of Computer Vision . 2017 (1)
[3]  
Simple statistical gradient-following algorithms for connectionist reinforcement learning[J] . Ronald J. Williams.Machine Learning . 1992 (3)