MindLLM: Lightweight large language model pre-training, evaluation and domain application

被引：0

作者：

Yang, Yizhe

Sun, Huashan

Li, Jiawei

Liu, Runheng

Li, Yinghao

Liu, Yuhang

Gao, Yang

Huang, Heyan ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China

来源：

AI OPEN | 2024年 / 5卷

基金：

中国国家自然科学基金;

关键词：

Large language model; Light weight; Bilingual;

D O I：

10.1016/j.aiopen.2024.08.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.

引用

页码：155 / 180

页数：26

共 50 条

[31] Multi-tool Integration Application for Math Reasoning Using Large Language Model
Duan, Zhihua
Wang, Jialin
2024 IEEE 10TH INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD, EDGECOM 2024, 2024, : 106 - 109
[32] LLM4SecHW: Leveraging Domain-Specific Large Language Model for Hardware Debugging
Fu, Weimin
Yang, Kaichen
Dutta, Raj Gautam
Guo, Xiaolong
Qu, Gang
2023 ASIAN HARDWARE ORIENTED SECURITY AND TRUST SYMPOSIUM, ASIANHOST, 2023,
[33] Hardware Phi-1.5B: A Large Language Model Encodes Hardware Domain Specific Knowledge
Fu, Weimin
Li, Shijie
Zhao, Yifang
Ma, Haocheng
Dutta, Raj
Zhang, Xuan
Yang, Kaichen
Jin, Yier
Guo, Xiaolong
29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 349 - 354
[34] Application of large language model combined with retrieval enhanced generation technology in digestive endoscopic nursing
Fu, Zhaoli
Fu, Siyuan
Huang, Yuan
He, Wenfang
Zhong, Zhuodan
Guo, Yan
Lin, Yanfeng
FRONTIERS IN MEDICINE, 2024, 11
[35] Using Large Language Model to Fill in Web Forms to Support Automated Web Application Testing
Chen, Feng-Kai
Liu, Chien-Hung
You, Shingchern D.
INFORMATION, 2025, 16 (02)
[36] Performance of the pre-trained large language model GPT-4 on automated short answer grading
Kortemeyer G.
Discover Artificial Intelligence, 2024, 4 (01):
[37] An Enhanced Retrieval Scheme for a Large Language Model with a Joint Strategy of Probabilistic Relevance and Semantic Association in the Vertical Domain
Chen, Qi
Zhou, Weifeng
Cheng, Jian
Yang, Ji
APPLIED SCIENCES-BASEL, 2024, 14 (24):
[38] Prompt matters: evaluation of large language model chatbot responses related to Peyronie's disease
Warren, Christopher J.
Edmonds, Victoria S.
Payne, Nicolette G.
Voletti, Sandeep
Wu, Sarah Y.
Colquitt, Jennakay
Sadeghi-Nejad, Hossein
Punjani, Nahid
SEXUAL MEDICINE, 2024, 12 (04)
[39] Investigating the Accuracy and Completeness of an Artificial Intelligence Large Language Model About Uveitis: An Evaluation of ChatGPT
Marshall, Rayna F.
Mallem, Krishna
Xu, Hannah
Thorne, Jennifer
Burkholder, Bryn
Chaon, Benjamin
Liberman, Paulina
Berkenstock, Meghan
OCULAR IMMUNOLOGY AND INFLAMMATION, 2024, 32 (09) : 2052 - 2055
[40] An Application Programming Interface (API) Sensitive Data Identification Method Based on the Federated Large Language Model
Wu, Jianping
Chen, Lifeng
Fang, Siyuan
Wu, Chunming
APPLIED SCIENCES-BASEL, 2024, 14 (22):

← 1 2 3 4 5 →