VL-Few: Vision Language Alignment for Multimodal Few-Shot Meta Learning

被引:0
|
作者
Ma, Han [1 ]
Fan, Baoyu [1 ]
Ng, Benjamin K. [1 ]
Lam, Chan-Tong [1 ]
机构
[1] Macao Polytech Univ, Fac Appl Sci, Macau 999078, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 03期
关键词
vision language learning; representation alignment; multimodal learning; meta learning; few-shot learning; visual question answering;
D O I
10.3390/app14031169
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Complex tasks in the real world involve different modal models, such as visual question answering (VQA). However, traditional multimodal learning requires a large amount of aligned data, such as image text pairs, and constructing a large amount of training data is a challenge for multimodal learning. Therefore, we propose VL-Few, which is a simple and effective method to solve the multimodal few-shot problem. VL-Few (1) proposes the modal alignment, which aligns visual features into language space through a lightweight model network and improves the multimodal understanding ability of the model; (2) adopts few-shot meta learning in the multimodal problem, which constructs a few-shot meta task pool to improve the generalization ability of the model; (3) proposes semantic alignment to enhance the semantic understanding ability of the model for the task, context, and demonstration; (4) proposes task alignment that constructs training data into the target task form and improves the task understanding ability of the model; (5) proposes generation alignment, which adopts the token-level training and multitask fusion loss to improve the generation ability of the model. Our experimental results show the effectiveness of VL-Few for multimodal few-shot problems.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Few-Shot Lifelong Learning
    Mazumder, Pratik
    Singh, Pravendra
    Rai, Piyush
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2337 - 2345
  • [42] Interventional Few-Shot Learning
    Yue, Zhongqi
    Zhang, Hanwang
    Sun, Qianru
    Hua, Xian-Sheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [43] Inference Calibration of Vision-Language Foundation Models for Zero-Shot and Few-Shot Learning
    Hu, Minyang
    Chang, Hong
    Shan, Shiguang
    Chen, Xilin
    PATTERN RECOGNITION LETTERS, 2025, 192 : 15 - 21
  • [44] Mask-Guided Vision Transformer for Few-Shot Learning
    Chen, Yuzhong
    Xiao, Zhenxiang
    Pan, Yi
    Zhao, Lin
    Dai, Haixing
    Wu, Zihao
    Li, Changhe
    Zhang, Tuo
    Li, Changying
    Zhu, Dajiang
    Liu, Tianming
    Jiang, Xi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [45] Language Models are Few-Shot Butlers
    Micheli, Vincent
    Fleuret, Francois
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9312 - 9318
  • [46] Language Models are Few-Shot Learners
    Brown, Tom B.
    Mann, Benjamin
    Ryder, Nick
    Subbiah, Melanie
    Kaplan, Jared
    Dhariwal, Prafulla
    Neelakantan, Arvind
    Shyam, Pranav
    Sastry, Girish
    Askell, Amanda
    Agarwal, Sandhini
    Herbert-Voss, Ariel
    Krueger, Gretchen
    Henighan, Tom
    Child, Rewon
    Ramesh, Aditya
    Ziegler, Daniel M.
    Wu, Jeffrey
    Winter, Clemens
    Hesse, Christopher
    Chen, Mark
    Sigler, Eric
    Litwin, Mateusz
    Gray, Scott
    Chess, Benjamin
    Clark, Jack
    Berner, Christopher
    McCandlish, Sam
    Radford, Alec
    Sutskever, Ilya
    Amodei, Dario
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [47] ML-FDA: Meta-Learning via Feature Distribution Alignment for Few-Shot Learning
    Li, Yuewen
    Liu, Binghao
    Lyu, Shuchang
    Chen, Lijiang
    Zhao, Qi
    Feng, Wenquan
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [48] Black Box Few-Shot Adaptation for Vision-Language models
    Ouali, Yassine
    Bulat, Adrian
    Matinez, Brais
    Tzimiropoulos, Georgios
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15488 - 15500
  • [49] FLamE: Few-shot Learning from Natural Language Explanations
    Zhou, Yangqiaoyu
    Zhang, Yiming
    Tan, Chenhao
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 6743 - 6763
  • [50] ATLAS: Few-shot Learning with Retrieval Augmented Language Models
    Izacard, Gautier
    Lewis, Patrick
    Lomeli, Maria
    Hosseini, Lucas
    Petroni, Fabio
    Schick, Timo
    Dwivedi-Yu, Jane
    Joulin, Armand
    Riedel, Sebastian
    Grave, Edouard
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24