Self-Distillation for Few-Shot Image Captioning

被引:16
作者
Chen, Xianyu [1 ]
Jiang, Ming [1 ]
Zhao, Qi [1 ]
机构
[1] Univ Minnesota Twin Cities, Minneapolis, MN 55455 USA
来源
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021) | 2021年
关键词
D O I
10.1109/WACV48630.2021.00059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The development of large-scale image-captioning datasets is expensive, while the abundance of unpaired images and text corpus can potentially help reduce the efforts of manual annotation. In this paper, we study the few-shot image captioning problem that only requires a small amount of annotated image-caption pairs. We propose an ensemble-based self-distillation method that allows image captioning models to be trained with unpaired images and captions. The ensemble consists of multiple base models trained with different data samples in each iteration. For learning from unpaired images, we generate multiple pseudo captions with the ensemble and allocate different weights according to their confidence levels. For learning from unpaired captions, we propose a simple yet effective pseudo feature generation method based on Gradient Descent. The pseudo captions and pseudo features from the ensemble are used to train the base models in future iterations. The proposed method is general over different image captioning models and datasets. Our experiments demonstrate significant performance improvements and meaningful captions generated with only 1% of paired training data.
引用
收藏
页码:545 / 555
页数:11
相关论文
共 64 条
[1]  
Agrawal Harsh, 2019, IEEE INT C COMP VIS
[2]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[3]   SPICE: Semantic Propositional Image Caption Evaluation [J].
Anderson, Peter ;
Fernando, Basura ;
Johnson, Mark ;
Gould, Stephen .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :382-398
[4]  
[Anonymous], 2004, ROUGE PACKAGE AUTOMA
[5]  
[Anonymous], 2017, EMNLP
[6]  
[Anonymous], 2005, P ACL WORKSH INTR EX
[7]  
[Anonymous], 2010, Statistical Machine Translation
[8]  
Chen Tseng-Hung, IEEE INT C COMP VIS
[9]  
Chen Wenhu, 2016, ARXIV161105321
[10]   Few-Example Object Detection with Model Communication [J].
Dong, Xuanyi ;
Zheng, Liang ;
Ma, Fan ;
Yang, Yi ;
Meng, Deyu .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (07) :1641-1654