PAEE: Parameter-Efficient and Data-Effective Image Captioning Model with Knowledge Prompter and Cross-Modal Representation Aligner

被引：0

作者：

Tian, Yunji ^{[1
]}

Liu, Zhiming ^{[1
]}

Zou, Quan ^{[1
]}

Chen, Geng ^{[1
]}

机构：

[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China

来源：

WEB AND BIG DATA, PT II, APWEB-WAIM 2023 | 2024年 / 14332卷

关键词：

image captioning; parameter-efficient; data-effective; CRA; knowledge prompter; small-data learning;

D O I：

10.1007/978-981-97-2390-4_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large-scale pre-trained models and research on massive data have achieved state-of-the-art results in image captioning technology. However, the high cost of pre-training and fine-tuning has become a significant issue that needs to be considered. In this paper, we propose PAEE, a parameter-efficient and data-effective image captioning model that generates captions based on the input image encoding and the knowledge obtained from the newly introduced Knowledge Prompter. In PAEE, the only module that needs to be learned is the Cross-modal Representation Aligner (CRA) introduced between the visual encoder and language decoder, which facilitates the language model's better adaptation to visual representation. The entire model greatly reduces the cost of pre-training and fine-tuning. Extensive experiments demonstrate that PAEE maintains competitive performance compared to large-scale pre-trained models and similar approaches, while reducing the number of trainable parameters. We design two new datasets to explore the data utilization ability of PAEE and discover that it can effectively use new data and achieve domain transfer without any training or fine-tuning. Additionally, we introduce the concept of small-data learning and find that PAEE has data-effective characteristics in limited computing resources and performs well even with fewer training samples.

引用

页码：117 / 131

页数：15