Fine-tuning large language models for improved health communication in low-resource languages

被引：0

作者：

Bui, Nhat ^{[1
]}

Nguyen, Giang ^{[1
]}

Nguyen, Nguyen ^{[1
]}

Vo, Bao ^{[1
]}

Vo, Luan ^{[1
]}

Huynh, Tom ^{[1
]}

Tang, Arthur ^{[1
]}

Tran, Van Nhiem ^{[2
]}

Huynh, Tuyen ^{[3
]}

Nguyen, Huy Quang ^{[3
]}

Dinh, Minh ^{[1
]}

机构：

[1] RMIT Univ, Sch Sci Engn & Technol, Ho Chi Minh City, Vietnam

[2] Hon Hai Res Inst, AI Res Ctr, Taipei 114699, Taiwan

[3] Oxford Univ Clin Res Unit OUCRU, Ho Chi Minh City, Vietnam

来源：

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE | 2025年 / 263卷

关键词：

Artificial intelligence; Large language model; Low-resources languages; Health communication and promotion; Data privacy and security; Health equity;

D O I：

10.1016/j.cmpb.2025.108655

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Background: The reported study illustrates a methodology for compiling training datasets to fine-tune Large Language Models (LLMs) for healthcare information in Vietnamese, a low-resource language. The objective is to bridge the gap in medical information accessibility and enhance healthcare communication in developing countries by adapting LLMs to specific linguistic nuances and domain needs. Method: The methodology involves selecting a base model, compiling a domain-specific dataset, and fine-tuning the model with this dataset. Three open-source models were selected. The dataset, comprising approximately 337,000 prompt-response pairs in Vietnamese, was compiled using existing datasets, data crawled from Vietnamese medical online forums, and distilled from Vietnamese medical textbooks. The three models were finetuned using the Low-Rank adaptation (LoRA) and Quantized Low-Rank adaptation (QLoRA) techniques. Models' performances were evaluated using BertScore score, Rouge-L score, and the "LLM-as-a-Judge" method. Results: The fine-tuned models showed enhancements in performance over their base versions across evaluation metrics in BertScore score, Rouge-L score and "LLM-as-a-Judge" method, confirming the effectiveness of the finetuning process. This study details the process of fine-tuning open-source LLMs for health information inquiries in Vietnamese, demonstrating its potential to improve healthcare communication in low-resource languages. Deploying the fine-tuned LLM on-premise enhances data privacy and security. However, the significant computing power and costs required pose challenges, especially for organizations in developing countries. Conclusion: This case study highlights the unique challenges faced by developing countries using low-resource languages. Initiatives are needed to emphasize efforts to bridge healthcare gaps in underserved areas and contribute to global health equity.

引用

页数：11

共 28 条

[1] Large Language Models With Contrastive Decoding Algorithm for Hallucination Mitigation in Low-Resource Languages
Zan, Hongying
Javed, Arifa
Abdullah, Muhammad
Rashid, Javed
Faheem, Muhammad
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2025,
[2] Getting it right: the limits of fine-tuning large language models
Browning, Jacob
ETHICS AND INFORMATION TECHNOLOGY, 2024, 26 (02)
[3] Parameter-efficient fine-tuning in large language models: a survey of methodologies
Wang, Luping
Chen, Sheng
Jiang, Linnan
Pan, Shu
Cai, Runze
Yang, Sen
Yang, Fei
ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (08)
[4] Fine-tuning large language models for rare disease concept normalization
Wang, Andy
Liu, Cong
Yang, Jingye
Weng, Chunhua
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
[5] Federated Low-Rank Adaptation for Large Models Fine-Tuning Over Wireless Networks
Sun, Haofeng
Tian, Hui
Ni, Wanli
Zheng, Jingheng
Niyato, Dusit
Zhang, Ping
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2025, 24 (01) : 659 - 675
[6] Leveraging error-assisted fine-tuning large language models for manufacturing excellence
Xia, Liqiao
Li, Chengxi
Zhang, Canbin
Liu, Shimin
Zheng, Pai
ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2024, 88
[7] Named entity recognition for construction documents based on fine-tuning of large language models with low-quality datasets
Zhou, Junyu
Ma, Zhiliang
AUTOMATION IN CONSTRUCTION, 2025, 174
[8] Efficient fine-tuning of large language models for automated building energy modeling in complex cases
Jiang, Gang
Chen, Jianli
AUTOMATION IN CONSTRUCTION, 2025, 175
[9] Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language
Koltcov, Sergei
Surkov, Anton
Koltsova, Olessia
Ignatenko, Vera
PEERJ COMPUTER SCIENCE, 2024, 10 : 1 - 19
[10] Efficient fine-tuning of short text classification based on large language model
Wang, Likun
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MODELING, NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING, CMNM 2024, 2024, : 33 - 38

← 1 2 3 →