MindLLM: Lightweight large language model pre-training, evaluation and domain application

被引:0
|
作者
Yang, Yizhe
Sun, Huashan
Li, Jiawei
Liu, Runheng
Li, Yinghao
Liu, Yuhang
Gao, Yang
Huang, Heyan [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
来源
AI OPEN | 2024年 / 5卷
基金
中国国家自然科学基金;
关键词
Large language model; Light weight; Bilingual;
D O I
10.1016/j.aiopen.2024.08.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.
引用
收藏
页码:155 / 180
页数:26
相关论文
共 50 条
  • [1] Research on the Training and Application Methods of a Lightweight Agricultural Domain-Specific Large Language Model Supporting Mandarin Chinese and Uyghur
    Pan, Kun
    Zhang, Xiaogang
    Chen, Liping
    APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [2] Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications
    Xie, Han
    Zheng, Da
    Ma, Jun
    Zhang, Houyu
    Ioannidis, Vassilis N.
    Song, Xiang
    Ping, Qing
    Wang, Sheng
    Yang, Carl
    Xu, Yi
    Zeng, Belinda
    Chilimbi, Trishul
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5270 - 5281
  • [3] JiuZhou: open foundation language models and effective pre-training framework for geoscience
    Chen, Zhou
    Lin, Ming
    Zang, Mingrun
    Wang, Zimeng
    Li, Juanzi
    Bai, Yuqi
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2025, 18 (01)
  • [4] MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
    Zhao, Hang
    Xing, Yifei
    Yu, Zhesong
    Zhu, Bilei
    Lu, Lu
    Ma, Zejun
    INTERSPEECH 2024, 2024, : 52 - 56
  • [5] Innovators and transformers: enhancing supply chain employee training with an innovative application of a large language model
    Gezdur, Arda
    Bhattacharjya, Jyotirmoyee
    INTERNATIONAL JOURNAL OF PHYSICAL DISTRIBUTION & LOGISTICS MANAGEMENT, 2025,
  • [6] RDscan: Extracting RNA-disease relationship from the literature based on pre-training model
    Zhang, Yang
    Yang, Yu
    Ren, Liping
    Ning, Lin
    Zou, Quan
    Luo, Nanchao
    Zhang, Yinghui
    Liu, Ruijun
    METHODS, 2024, 228 : 48 - 54
  • [7] WaterGPT: Training a Large Language Model to Become a Hydrology Expert
    Ren, Yi
    Zhang, Tianyi
    Dong, Xurong
    Li, Weibin
    Wang, Zhiyang
    He, Jie
    Zhang, Hanzhi
    Jiao, Licheng
    WATER, 2024, 16 (21)
  • [8] Alibaba HPN: A Data Center Network for Large Language Model Training
    Qian, Kun
    Xi, Yongqing
    Cao, Jiamin
    Gao, Jiaqi
    Xu, Yichi
    Guan, Yu
    Fu, Binzhang
    Shi, Xuemei
    Zhu, Fangbo
    Miao, Rui
    Wang, Chao
    Wang, Peng
    Zhang, Pengcheng
    Zeng, Xianlong
    Ruan, Eddie
    Yao, Zhiping
    Zhai, Ennan
    Cai, Dennis
    PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024, 2024, : 691 - 706
  • [9] Domain Knowledge Distillation from Large Language Model: An Empirical Study in the Autonomous Driving Domain
    Tang, Yun
    da Costa, Antonio A. Bruto
    Zhang, Xizhe
    Patrick, Irvine
    Khastgir, Siddartha
    Jennings, Paul
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 3893 - 3900
  • [10] PreparedLLM: effective pre-pretraining framework for domain-specific large language models
    Chen, Zhou
    Lin, Ming
    Wang, Zimeng
    Zang, Mingrun
    Bai, Yuqi
    BIG EARTH DATA, 2024, 8 (04) : 649 - 672