MindLLM: Lightweight large language model pre-training, evaluation and domain application

被引：0

作者：

Yang, Yizhe

Sun, Huashan

Li, Jiawei

Liu, Runheng

Li, Yinghao

Liu, Yuhang

Gao, Yang

Huang, Heyan ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China

来源：

AI OPEN | 2024年 / 5卷

基金：

中国国家自然科学基金;

关键词：

Large language model; Light weight; Bilingual;

D O I：

10.1016/j.aiopen.2024.08.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.

引用

页码：155 / 180

页数：26

共 50 条

[21] Chinese Text Open Domain Tag Generation Method via Large Language Model
He, Chunhui
Ge, Bin
Zhang, Chong
2024 10TH INTERNATIONAL CONFERENCE ON BIG DATA AND INFORMATION ANALYTICS, BIGDIA 2024, 2024, : 183 - 188
[22] Establishing best practices in large language model research: an application to repeat prompting
Gallo, Robert J.
Baiocchi, Michael
Savage, Thomas R.
Chen, Jonathan H.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, : 386 - 390
[23] Application of unified health large language model evaluation framework to In-Basket message replies: bridging qualitative and quantitative assessments
Hong, Chuan
Chowdhury, Anand
Sorrentino, Anthony D.
Wang, Haoyuan
Agrawal, Monica
Bedoya, Armando
Bessias, Sophia
Economou-Zavlanos, Nicoleta J.
Wong, Ian
Pean, Christian
Li, Fan
Pollak, Kathryn, I
Poon, Eric G.
Pencina, Michael J.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2025, 32 (04) : 626 - 637
[24] Large Language Model Evaluation Criteria Framework in Healthcare: Fuzzy MCDM Approach
Hamzeh Mohammad Alabool
SN Computer Science, 6 (1)
[25] Evaluation of a Large Language Model's Ability to Assist in an Orthopedic Hand Clinic
Kotzur, Travis
Singh, Aaron
Parker, John
Peterson, Blaire
Sager, Brian
Rose, Ryan
Corley, Fred
Brady, Christina
HAND-AMERICAN ASSOCIATION FOR HAND SURGERY, 2024,
[26] Research on Large Language Model Q&A Method Based on Specific Domain Regulation Documents
Li, Wei
Wang, Shimin
Yao, Junyan
PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 319 - 323
[27] Prompting large language model with context and pre-answer for knowledge-based VQA
Hu, Zhongjian
Yang, Peng
Jiang, Yuanshuang
Bai, Zijian
PATTERN RECOGNITION, 2024, 151
[28] Automatic quantitative stroke severity assessment based on Chinese clinical named entity recognition with domain-adaptive pre-trained large language model
Gu, Zhanzhong
He, Xiangjian
Yu, Ping
Jia, Wenjing
Yang, Xiguang
Peng, Gang
Hu, Penghui
Chen, Shiyan
Chen, Hongjie
Lin, Yiguang
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 150
[29] Large language model evaluation for high-performance computing software development
Godoy, William F.
Valero-Lara, Pedro
Teranishi, Keita
Balaprakash, Prasanna
Vetter, Jeffrey S.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (26)
[30] Development and Performance of a Large Language Model for the Quality Evaluation of Multi-Language Medical Imaging Guidelines and Consensus
Wang, Zhixiang
Sun, Jing
Liu, Hui
Luo, Xufei
Li, Jia
He, Wenjun
Yang, Zhenhua
Lv, Han
Chen, Yaolong
Wang, Zhenchang
JOURNAL OF EVIDENCE BASED MEDICINE, 2025, 18 (02)

← 1 2 3 4 5 →