Fine-tuning your answers: a bag of tricks for improving VQA models

被引：0

作者：

Arroyo, Roberto ^{[1
]}

Alvarez, Sergio ^{[1
]}

Aller, Aitor ^{[1
]}

Bergasa, Luis M. ^{[2
]}

Ortiz, Miguel E. ^{[2
]}

机构：

[1] NielsenIQ, Madrid, Spain

[2] Univ Alcala UAH, Elect Dept, Madrid, Spain

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2022年 / 81卷 / 19期

关键词：

Computer vision; Natural language processing; Knowledge representation & reasoning; Visual question answering; Artificial intelligence;

D O I：

10.1007/s11042-021-11546-z

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, one of the most novel topics in Deep Learning (DL) is explored: Visual Question Answering (VQA). This research area uses three of the most important fields in Artificial Intelligence (AI) to automatically provide natural language answers for questions that a user can ask about an image. These fields are: 1) Computer Vision (CV), 2) Natural Language Processing (NLP) and 3) Knowledge Representation & Reasoning (KR&R). Initially, a review of the state of art in VQA and our contributions to it are discussed. Then, we build upon the ideas provided by Pythia, which is one of the most outstanding approaches. Therefore, a study of the Pythia's architecture is carried out with the aim of presenting varied enhancements with respect to the original proposal in order to fine-tune models using a bag of tricks. Several training strategies are compared to increase the global accuracy and understand the limitations associated with VQA models. Extended results check the impact of the different tricks over our enhanced architecture, jointly with additional qualitative results.

引用

页码：26889 / 26913

页数：25

共 34 条

[1] Fine-tuning your answers: a bag of tricks for improving VQA models
Roberto Arroyo
Sergio Álvarez
Aitor Aller
Luis M. Bergasa
Miguel E. Ortiz
Multimedia Tools and Applications, 2022, 81 : 26889 - 26913
[2] Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning
Brydinskyi, Vitalii
Sabodashko, Dmytro
Khoma, Yuriy
Podpora, Michal
Konovalov, Alexander
Khoma, Volodymyr
IEEE ACCESS, 2024, 12 : 116649 - 116656
[3] Getting it right: the limits of fine-tuning large language models
Browning, Jacob
ETHICS AND INFORMATION TECHNOLOGY, 2024, 26 (02)
[4] Improving optimization of convolutional neural networks through parameter fine-tuning
Becherer, Nicholas
Pecarina, John
Nykl, Scott
Hopkinson, Kenneth
NEURAL COMPUTING & APPLICATIONS, 2019, 31 (08) : 3469 - 3479
[5] Improving optimization of convolutional neural networks through parameter fine-tuning
Nicholas Becherer
John Pecarina
Scott Nykl
Kenneth Hopkinson
Neural Computing and Applications, 2019, 31 : 3469 - 3479
[6] Breaking the Barrier Between Pre-training and Fine-tuning: A Hybrid Prompting Model for Knowledge-Based VQA
Sun, Zhongfan
Hu, Yongli
Gao, Qingqing
Jiang, Huajie
Gao, Junbin
Sun, Yanfeng
Yin, Baocai
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4065 - 4073
[7] Generative Models for Source Code: Fine-Tuning Techniques for Structured Pattern Learning
Franzoni, Valentina
Tagliente, Silvia
Milani, Alfredo
TECHNOLOGIES, 2024, 12 (11)
[8] Parameter-efficient fine-tuning in large language models: a survey of methodologies
Wang, Luping
Chen, Sheng
Jiang, Linnan
Pan, Shu
Cai, Runze
Yang, Sen
Yang, Fei
ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (08)
[9] Enhanced Discriminative Fine-Tuning of Large Language Models for Chinese Text Classification
Song, Jinwang
Zan, Hongying
Zhang, Kunli
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 168 - 174
[10] Exploiting Syntactic Information to Boost the Fine-tuning of Pre-trained Models
Liu, Chaoming
Zhu, Wenhao
Zhang, Xiaoyu
Zhai, Qiuhong
2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 575 - 582

← 1 2 3 4 →