Fine-tuning your answers: a bag of tricks for improving VQA models

被引:0
|
作者
Arroyo, Roberto [1 ]
Alvarez, Sergio [1 ]
Aller, Aitor [1 ]
Bergasa, Luis M. [2 ]
Ortiz, Miguel E. [2 ]
机构
[1] NielsenIQ, Madrid, Spain
[2] Univ Alcala UAH, Elect Dept, Madrid, Spain
关键词
Computer vision; Natural language processing; Knowledge representation & reasoning; Visual question answering; Artificial intelligence;
D O I
10.1007/s11042-021-11546-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, one of the most novel topics in Deep Learning (DL) is explored: Visual Question Answering (VQA). This research area uses three of the most important fields in Artificial Intelligence (AI) to automatically provide natural language answers for questions that a user can ask about an image. These fields are: 1) Computer Vision (CV), 2) Natural Language Processing (NLP) and 3) Knowledge Representation & Reasoning (KR&R). Initially, a review of the state of art in VQA and our contributions to it are discussed. Then, we build upon the ideas provided by Pythia, which is one of the most outstanding approaches. Therefore, a study of the Pythia's architecture is carried out with the aim of presenting varied enhancements with respect to the original proposal in order to fine-tune models using a bag of tricks. Several training strategies are compared to increase the global accuracy and understand the limitations associated with VQA models. Extended results check the impact of the different tricks over our enhanced architecture, jointly with additional qualitative results.
引用
收藏
页码:26889 / 26913
页数:25
相关论文
共 34 条
  • [1] Fine-tuning your answers: a bag of tricks for improving VQA models
    Roberto Arroyo
    Sergio Álvarez
    Aitor Aller
    Luis M. Bergasa
    Miguel E. Ortiz
    Multimedia Tools and Applications, 2022, 81 : 26889 - 26913
  • [2] Enhancing Automatic Speech Recognition With Personalized Models: Improving Accuracy Through Individualized Fine-Tuning
    Brydinskyi, Vitalii
    Sabodashko, Dmytro
    Khoma, Yuriy
    Podpora, Michal
    Konovalov, Alexander
    Khoma, Volodymyr
    IEEE ACCESS, 2024, 12 : 116649 - 116656
  • [3] Getting it right: the limits of fine-tuning large language models
    Browning, Jacob
    ETHICS AND INFORMATION TECHNOLOGY, 2024, 26 (02)
  • [4] Improving optimization of convolutional neural networks through parameter fine-tuning
    Becherer, Nicholas
    Pecarina, John
    Nykl, Scott
    Hopkinson, Kenneth
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (08) : 3469 - 3479
  • [5] Improving optimization of convolutional neural networks through parameter fine-tuning
    Nicholas Becherer
    John Pecarina
    Scott Nykl
    Kenneth Hopkinson
    Neural Computing and Applications, 2019, 31 : 3469 - 3479
  • [6] Breaking the Barrier Between Pre-training and Fine-tuning: A Hybrid Prompting Model for Knowledge-Based VQA
    Sun, Zhongfan
    Hu, Yongli
    Gao, Qingqing
    Jiang, Huajie
    Gao, Junbin
    Sun, Yanfeng
    Yin, Baocai
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4065 - 4073
  • [7] Generative Models for Source Code: Fine-Tuning Techniques for Structured Pattern Learning
    Franzoni, Valentina
    Tagliente, Silvia
    Milani, Alfredo
    TECHNOLOGIES, 2024, 12 (11)
  • [8] Parameter-efficient fine-tuning in large language models: a survey of methodologies
    Wang, Luping
    Chen, Sheng
    Jiang, Linnan
    Pan, Shu
    Cai, Runze
    Yang, Sen
    Yang, Fei
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (08)
  • [9] Enhanced Discriminative Fine-Tuning of Large Language Models for Chinese Text Classification
    Song, Jinwang
    Zan, Hongying
    Zhang, Kunli
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 168 - 174
  • [10] Exploiting Syntactic Information to Boost the Fine-tuning of Pre-trained Models
    Liu, Chaoming
    Zhu, Wenhao
    Zhang, Xiaoyu
    Zhai, Qiuhong
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 575 - 582