MindLLM: Lightweight large language model pre-training, evaluation and domain application

被引:0
|
作者
Yang, Yizhe
Sun, Huashan
Li, Jiawei
Liu, Runheng
Li, Yinghao
Liu, Yuhang
Gao, Yang
Huang, Heyan [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
来源
AI OPEN | 2024年 / 5卷
基金
中国国家自然科学基金;
关键词
Large language model; Light weight; Bilingual;
D O I
10.1016/j.aiopen.2024.08.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.
引用
收藏
页码:155 / 180
页数:26
相关论文
共 50 条
  • [21] Chinese Text Open Domain Tag Generation Method via Large Language Model
    He, Chunhui
    Ge, Bin
    Zhang, Chong
    2024 10TH INTERNATIONAL CONFERENCE ON BIG DATA AND INFORMATION ANALYTICS, BIGDIA 2024, 2024, : 183 - 188
  • [22] Establishing best practices in large language model research: an application to repeat prompting
    Gallo, Robert J.
    Baiocchi, Michael
    Savage, Thomas R.
    Chen, Jonathan H.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, : 386 - 390
  • [23] Application of unified health large language model evaluation framework to In-Basket message replies: bridging qualitative and quantitative assessments
    Hong, Chuan
    Chowdhury, Anand
    Sorrentino, Anthony D.
    Wang, Haoyuan
    Agrawal, Monica
    Bedoya, Armando
    Bessias, Sophia
    Economou-Zavlanos, Nicoleta J.
    Wong, Ian
    Pean, Christian
    Li, Fan
    Pollak, Kathryn, I
    Poon, Eric G.
    Pencina, Michael J.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2025, 32 (04) : 626 - 637
  • [24] Large Language Model Evaluation Criteria Framework in Healthcare: Fuzzy MCDM Approach
    Hamzeh Mohammad Alabool
    SN Computer Science, 6 (1)
  • [25] Evaluation of a Large Language Model's Ability to Assist in an Orthopedic Hand Clinic
    Kotzur, Travis
    Singh, Aaron
    Parker, John
    Peterson, Blaire
    Sager, Brian
    Rose, Ryan
    Corley, Fred
    Brady, Christina
    HAND-AMERICAN ASSOCIATION FOR HAND SURGERY, 2024,
  • [26] Research on Large Language Model Q&A Method Based on Specific Domain Regulation Documents
    Li, Wei
    Wang, Shimin
    Yao, Junyan
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 319 - 323
  • [27] Prompting large language model with context and pre-answer for knowledge-based VQA
    Hu, Zhongjian
    Yang, Peng
    Jiang, Yuanshuang
    Bai, Zijian
    PATTERN RECOGNITION, 2024, 151
  • [28] Automatic quantitative stroke severity assessment based on Chinese clinical named entity recognition with domain-adaptive pre-trained large language model
    Gu, Zhanzhong
    He, Xiangjian
    Yu, Ping
    Jia, Wenjing
    Yang, Xiguang
    Peng, Gang
    Hu, Penghui
    Chen, Shiyan
    Chen, Hongjie
    Lin, Yiguang
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 150
  • [29] Large language model evaluation for high-performance computing software development
    Godoy, William F.
    Valero-Lara, Pedro
    Teranishi, Keita
    Balaprakash, Prasanna
    Vetter, Jeffrey S.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (26)
  • [30] Development and Performance of a Large Language Model for the Quality Evaluation of Multi-Language Medical Imaging Guidelines and Consensus
    Wang, Zhixiang
    Sun, Jing
    Liu, Hui
    Luo, Xufei
    Li, Jia
    He, Wenjun
    Yang, Zhenhua
    Lv, Han
    Chen, Yaolong
    Wang, Zhenchang
    JOURNAL OF EVIDENCE BASED MEDICINE, 2025, 18 (02)