Fine-tuning large language models for improved health communication in low-resource languages

被引:0
|
作者
Bui, Nhat [1 ]
Nguyen, Giang [1 ]
Nguyen, Nguyen [1 ]
Vo, Bao [1 ]
Vo, Luan [1 ]
Huynh, Tom [1 ]
Tang, Arthur [1 ]
Tran, Van Nhiem [2 ]
Huynh, Tuyen [3 ]
Nguyen, Huy Quang [3 ]
Dinh, Minh [1 ]
机构
[1] RMIT Univ, Sch Sci Engn & Technol, Ho Chi Minh City, Vietnam
[2] Hon Hai Res Inst, AI Res Ctr, Taipei 114699, Taiwan
[3] Oxford Univ Clin Res Unit OUCRU, Ho Chi Minh City, Vietnam
关键词
Artificial intelligence; Large language model; Low-resources languages; Health communication and promotion; Data privacy and security; Health equity;
D O I
10.1016/j.cmpb.2025.108655
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: The reported study illustrates a methodology for compiling training datasets to fine-tune Large Language Models (LLMs) for healthcare information in Vietnamese, a low-resource language. The objective is to bridge the gap in medical information accessibility and enhance healthcare communication in developing countries by adapting LLMs to specific linguistic nuances and domain needs. Method: The methodology involves selecting a base model, compiling a domain-specific dataset, and fine-tuning the model with this dataset. Three open-source models were selected. The dataset, comprising approximately 337,000 prompt-response pairs in Vietnamese, was compiled using existing datasets, data crawled from Vietnamese medical online forums, and distilled from Vietnamese medical textbooks. The three models were finetuned using the Low-Rank adaptation (LoRA) and Quantized Low-Rank adaptation (QLoRA) techniques. Models' performances were evaluated using BertScore score, Rouge-L score, and the "LLM-as-a-Judge" method. Results: The fine-tuned models showed enhancements in performance over their base versions across evaluation metrics in BertScore score, Rouge-L score and "LLM-as-a-Judge" method, confirming the effectiveness of the finetuning process. This study details the process of fine-tuning open-source LLMs for health information inquiries in Vietnamese, demonstrating its potential to improve healthcare communication in low-resource languages. Deploying the fine-tuned LLM on-premise enhances data privacy and security. However, the significant computing power and costs required pose challenges, especially for organizations in developing countries. Conclusion: This case study highlights the unique challenges faced by developing countries using low-resource languages. Initiatives are needed to emphasize efforts to bridge healthcare gaps in underserved areas and contribute to global health equity.
引用
收藏
页数:11
相关论文
共 28 条
  • [1] Large Language Models With Contrastive Decoding Algorithm for Hallucination Mitigation in Low-Resource Languages
    Zan, Hongying
    Javed, Arifa
    Abdullah, Muhammad
    Rashid, Javed
    Faheem, Muhammad
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2025,
  • [2] Getting it right: the limits of fine-tuning large language models
    Browning, Jacob
    ETHICS AND INFORMATION TECHNOLOGY, 2024, 26 (02)
  • [3] Parameter-efficient fine-tuning in large language models: a survey of methodologies
    Wang, Luping
    Chen, Sheng
    Jiang, Linnan
    Pan, Shu
    Cai, Runze
    Yang, Sen
    Yang, Fei
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (08)
  • [4] Fine-tuning large language models for rare disease concept normalization
    Wang, Andy
    Liu, Cong
    Yang, Jingye
    Weng, Chunhua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
  • [5] Federated Low-Rank Adaptation for Large Models Fine-Tuning Over Wireless Networks
    Sun, Haofeng
    Tian, Hui
    Ni, Wanli
    Zheng, Jingheng
    Niyato, Dusit
    Zhang, Ping
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2025, 24 (01) : 659 - 675
  • [6] Leveraging error-assisted fine-tuning large language models for manufacturing excellence
    Xia, Liqiao
    Li, Chengxi
    Zhang, Canbin
    Liu, Shimin
    Zheng, Pai
    ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2024, 88
  • [7] Named entity recognition for construction documents based on fine-tuning of large language models with low-quality datasets
    Zhou, Junyu
    Ma, Zhiliang
    AUTOMATION IN CONSTRUCTION, 2025, 174
  • [8] Efficient fine-tuning of large language models for automated building energy modeling in complex cases
    Jiang, Gang
    Chen, Jianli
    AUTOMATION IN CONSTRUCTION, 2025, 175
  • [9] Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language
    Koltcov, Sergei
    Surkov, Anton
    Koltsova, Olessia
    Ignatenko, Vera
    PEERJ COMPUTER SCIENCE, 2024, 10 : 1 - 19
  • [10] Efficient fine-tuning of short text classification based on large language model
    Wang, Likun
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MODELING, NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING, CMNM 2024, 2024, : 33 - 38