From Large Language Models to Large Multimodal Models: A Literature Review

被引:6
|
作者
Huang, Dawei [1 ]
Yan, Chuan [2 ]
Li, Qing [3 ]
Peng, Xiaojiang [3 ]
机构
[1] Shenzhen Univ, Coll Appl Sci, Shenzhen 518052, Peoples R China
[2] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
[3] Shenzhen Technol Univ, Coll Big data & Internet, Shenzhen 518118, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 12期
基金
中国国家自然科学基金;
关键词
large language models (LLMs); large multimodal models (LMMs); artificial intelligence;
D O I
10.3390/app14125068
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
With the deepening of research on Large Language Models (LLMs), significant progress has been made in recent years on the development of Large Multimodal Models (LMMs), which are gradually moving toward Artificial General Intelligence. This paper aims to summarize the recent progress from LLMs to LMMs in a comprehensive and unified way. First, we start with LLMs and outline various conceptual frameworks and key techniques. Then, we focus on the architectural components, training strategies, fine-tuning guidance, and prompt engineering of LMMs, and present a taxonomy of the latest vision-language LMMs. Finally, we provide a summary of both LLMs and LMMs from a unified perspective, make an analysis of the development status of large-scale models in the view of globalization, and offer potential research directions for large-scale models.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models
    Chen, Zheyi
    Xu, Liuchang
    Zheng, Hongting
    Chen, Luyao
    Tolba, Amr
    Zhao, Liang
    Yu, Keping
    Feng, Hailin
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (02): : 1753 - 1808
  • [2] A comprehensive survey of large language models and multimodal large models in medicine
    Xiao, Hanguang
    Zhou, Feizhong
    Liu, Xingyue
    Liu, Tianqi
    Li, Zhipeng
    Liu, Xin
    Huang, Xiaoxuan
    INFORMATION FUSION, 2025, 117
  • [3] A survey on multimodal large language models
    Yin, Shukang
    Fu, Chaoyou
    Zhao, Sirui
    Li, Ke
    Sun, Xing
    Xu, Tong
    Chen, Enhong
    NATIONAL SCIENCE REVIEW, 2024, 11 (12)
  • [4] A survey on multimodal large language models
    Shukang Yin
    Chaoyou Fu
    Sirui Zhao
    Ke Li
    Xing Sun
    Tong Xu
    Enhong Chen
    National Science Review, 2024, 11 (12) : 277 - 296
  • [5] The Application of Large Language Models in Gastroenterology: A Review of the Literature
    Maida, Marcello
    Celsa, Ciro
    Lau, Louis H. S.
    Ligresti, Dario
    Baraldo, Stefano
    Ramai, Daryl
    Di Maria, Gabriele
    Cannemi, Marco
    Facciorusso, Antonio
    Camma, Calogero
    CANCERS, 2024, 16 (19)
  • [6] Large language models and multimodal foundation models for precision oncology
    Truhn, Daniel
    Eckardt, Jan-Niklas
    Ferber, Dyke
    Kather, Jakob Nikolas
    NPJ PRECISION ONCOLOGY, 2024, 8 (01)
  • [7] Large language models and multimodal foundation models for precision oncology
    Daniel Truhn
    Jan-Niklas Eckardt
    Dyke Ferber
    Jakob Nikolas Kather
    npj Precision Oncology, 8
  • [8] Multimodal Large Language Models in Vision and Ophthalmology
    Lu, Zhiyong
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [9] The application of multimodal large language models in medicine
    Qiu, Jianing
    Yuan, Wu
    Lam, Kyle
    LANCET REGIONAL HEALTH-WESTERN PACIFIC, 2024, 45
  • [10] Visual cognition in multimodal large language models
    Buschoff, Luca M. Schulze
    Akata, Elif
    Bethge, Matthias
    Schulz, Eric
    NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 96 - 106