MindLLM: Lightweight large language model pre-training, evaluation and domain application

被引:0
|
作者
Yang, Yizhe
Sun, Huashan
Li, Jiawei
Liu, Runheng
Li, Yinghao
Liu, Yuhang
Gao, Yang
Huang, Heyan [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
来源
AI OPEN | 2024年 / 5卷
基金
中国国家自然科学基金;
关键词
Large language model; Light weight; Bilingual;
D O I
10.1016/j.aiopen.2024.08.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.
引用
收藏
页码:155 / 180
页数:26
相关论文
共 50 条
  • [31] Multi-tool Integration Application for Math Reasoning Using Large Language Model
    Duan, Zhihua
    Wang, Jialin
    2024 IEEE 10TH INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD, EDGECOM 2024, 2024, : 106 - 109
  • [32] LLM4SecHW: Leveraging Domain-Specific Large Language Model for Hardware Debugging
    Fu, Weimin
    Yang, Kaichen
    Dutta, Raj Gautam
    Guo, Xiaolong
    Qu, Gang
    2023 ASIAN HARDWARE ORIENTED SECURITY AND TRUST SYMPOSIUM, ASIANHOST, 2023,
  • [33] Hardware Phi-1.5B: A Large Language Model Encodes Hardware Domain Specific Knowledge
    Fu, Weimin
    Li, Shijie
    Zhao, Yifang
    Ma, Haocheng
    Dutta, Raj
    Zhang, Xuan
    Yang, Kaichen
    Jin, Yier
    Guo, Xiaolong
    29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 349 - 354
  • [34] Application of large language model combined with retrieval enhanced generation technology in digestive endoscopic nursing
    Fu, Zhaoli
    Fu, Siyuan
    Huang, Yuan
    He, Wenfang
    Zhong, Zhuodan
    Guo, Yan
    Lin, Yanfeng
    FRONTIERS IN MEDICINE, 2024, 11
  • [35] Using Large Language Model to Fill in Web Forms to Support Automated Web Application Testing
    Chen, Feng-Kai
    Liu, Chien-Hung
    You, Shingchern D.
    INFORMATION, 2025, 16 (02)
  • [36] Performance of the pre-trained large language model GPT-4 on automated short answer grading
    Kortemeyer G.
    Discover Artificial Intelligence, 2024, 4 (01):
  • [37] An Enhanced Retrieval Scheme for a Large Language Model with a Joint Strategy of Probabilistic Relevance and Semantic Association in the Vertical Domain
    Chen, Qi
    Zhou, Weifeng
    Cheng, Jian
    Yang, Ji
    APPLIED SCIENCES-BASEL, 2024, 14 (24):
  • [38] Prompt matters: evaluation of large language model chatbot responses related to Peyronie's disease
    Warren, Christopher J.
    Edmonds, Victoria S.
    Payne, Nicolette G.
    Voletti, Sandeep
    Wu, Sarah Y.
    Colquitt, Jennakay
    Sadeghi-Nejad, Hossein
    Punjani, Nahid
    SEXUAL MEDICINE, 2024, 12 (04)
  • [39] Investigating the Accuracy and Completeness of an Artificial Intelligence Large Language Model About Uveitis: An Evaluation of ChatGPT
    Marshall, Rayna F.
    Mallem, Krishna
    Xu, Hannah
    Thorne, Jennifer
    Burkholder, Bryn
    Chaon, Benjamin
    Liberman, Paulina
    Berkenstock, Meghan
    OCULAR IMMUNOLOGY AND INFLAMMATION, 2024, 32 (09) : 2052 - 2055
  • [40] An Application Programming Interface (API) Sensitive Data Identification Method Based on the Federated Large Language Model
    Wu, Jianping
    Chen, Lifeng
    Fang, Siyuan
    Wu, Chunming
    APPLIED SCIENCES-BASEL, 2024, 14 (22):