VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

被引：96

作者：

Chen, Jun ^{[1
]}

Guo, Han ^{[2
]}

Yi, Kai ^{[1
]}

Li, Boyang ^{[3
]}

Elhoseiny, Mohamed ^{[1
]}

机构：

[1] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[3] Nanyang Technol Univ, Singapore, Singapore

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

基金：

新加坡国家研究基金会;

关键词：

D O I：

10.1109/CVPR52688.2022.01750

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The limited availability of annotated data often hinders real-world applications of machine learning. To efficiently learn from small quantities of multimodal data, we leverage the linguistic knowledge from a large pre-trained language model (PLM) and quickly adapt it to new domains of image captioning. To effectively utilize a pretrained model, it is critical to balance the visual input and prior linguistic knowledge from pretraining. We propose VisualGPT, which employs a novel self-resurrecting encoderdecoder attention mechanism to quickly adapt the PLM with a small amount of in-domain image-text data. The proposed self-resurrecting activation unit produces sparse activations that prevent accidental overwriting of linguistic knowledge. When trained on 0.1%, 0.5% and 1% of the respective training sets, VisualGPT surpasses the best baseline by up to 10.0% CIDEr on MS COCO [43] and 17.9% CIDEr on Conceptual Captions [63]. Furthermore, VisualGPT achieves the state-of-the-art result on IU X-ray [15], a medical report generation dataset. Our code is available at https : // github.com/vision-CAIR/VisualGPT.

引用

页码：18009 / 18019

页数：11

共 83 条

[11]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00138

[12]

[Anonymous], 2017, CVPR, DOI DOI 10.1109/CVPR.2017.681

[13]

[Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-04070-39

[14]

[Anonymous], 2018, NEURIPS

[15]

[Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-01246-5_31

[16]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01094

[17]

Ba J. L., 2016, Advances in Neural Information Processing Systems (NeurIPS), P1

[18] Embedding a cluster-based overlay mesh in mobile ad hoc networks without cluster heads [J].

Banerjee, A ;

King, CT ;

Hsiao, HC .

2005 International Conference on Parallel Processsing, Proceedings, 2005, :49-56

[19]

Brown TB, 2020, ADV NEUR IN, V33

[20]

Chen L. C., 2017, RETHINKING ATROUS CO

← 1 2 3 4 5 6 7 8 9 →