Enhancement of the Performance of Large Language Models inDiabetes Education through Retrieval-Augmented Generation:Comparative Study

被引:1
作者
Wang, Dingqiao [1 ]
Liang, Jiangbo [1 ]
Ye, Jinguo [1 ]
Li, Jingni [1 ]
Li, Jingpeng [1 ]
Zhang, Qikai [1 ]
Hu, Qiuling [1 ]
Pan, Caineng [1 ]
Wang, Dongliang [1 ]
Liu, Zhong [1 ]
Shi, Wen [1 ]
Shi, Danli [2 ]
Li, Fei [1 ]
Qu, Bo [3 ]
Zheng, Yingfeng [1 ]
机构
[1] Sun Yat sen Univ, Zhongshan Ophthalm Ctr, Guangdong Prov Clin Res Ctr Ocular Dis, State Key Lab Ophthalmol,Guangdong Prov Key Lab Op, 07 Jinsui Rd, Guangzhou 510060, Peoples R China
[2] Hong Kong Polytech Univ, Res Ctr SHARP Vis, Hong Kong, Peoples R China
[3] Peking Univ Third Hosp, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
large language models; LLMs; retrieval-augmented generation; RAG; GPT-4.0; Claude-2; Google Bard; diabetes education;
D O I
10.2196/58041
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Large language models (LLMs) demonstrated advanced performance in processing clinical information. However,commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information.Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmentedInformation System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurateresponses to diabetes-related inquiries.Objective: This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool,to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.Methods: The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval,summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, AnthropicClaude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracyand comprehensiveness and by patients for understandability.Results: The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 baseLLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurateresponses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework alsoenhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024.Conclusions: The RISE significantly improves LLMs'performance in responding to diabetes-related inquiries, enhancingaccuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role inpatient education and chronic illness self-management, which contributes to relieving medical resource pressures and raisingpublic awareness of medical knowledge.
引用
收藏
页数:12
相关论文
共 53 条
  • [1] Al-Lawati Jawad A, 2017, Oman Med J, V32, P177, DOI 10.5001/omj.2017.34
  • [2] Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum
    Ayers, John W.
    Poliak, Adam
    Dredze, Mark
    Leas, Eric C.
    Zhu, Zechariah
    Kelley, Jessica B.
    Faix, Dennis J.
    Goodman, Aaron M.
    Longhurst, Christopher A.
    Hogarth, Michael
    Smith, Davey M.
    [J]. JAMA INTERNAL MEDICINE, 2023, 183 (06) : 589 - 596
  • [3] Large language models and the perils of their hallucinations
    Azamfirei, Razvan
    Kudchadkar, Sapna R.
    Fackler, James
    [J]. CRITICAL CARE, 2023, 27 (01)
  • [4] ChatGPT: five priorities for research
    Bockting, Claudi
    van Dis, Eva A. M.
    Bollen, Johan
    van Rooij, Robert
    Zuidema, Willem L.
    [J]. NATURE, 2023, 614 (7947) : 224 - 226
  • [5] Brown TB, 2020, ADV NEUR IN, V33
  • [6] 2019 Update to: Management of Hyperglycemia in Type 2 Diabetes, 2018. A Consensus Report by the American Diabetes Association (ADA) and the European Association for the Study of Diabetes (EASD)
    Buse, John B.
    Wexler, Deborah J.
    Tsapas, Apostolos
    Rossing, Peter
    Mingrone, Geltrude
    Mathieu, Chantal
    D'Alessio, David A.
    Davies, Melanie J.
    [J]. DIABETES CARE, 2020, 43 (02) : 487 - 493
  • [7] Chalkidis I, 2023, Arxiv, DOI [arXiv:2304.12202, 10.48550/arXiv.2304.12202, DOI 10.48550/ARXIV.2304.12202]
  • [8] Chen JW, 2024, AAAI CONF ARTIF INTE, P17754
  • [9] The Role of ChatGPT in the Advancement of Diagnosis, Management, and Prognosis of Cardiovascular and Cerebrovascular Disease
    Chlorogiannis, David-Dimitris
    Apostolos, Anastasios
    Chlorogiannis, Anargyros
    Palaiodimos, Leonidas
    Giannakoulas, George
    Pargaonkar, Sumant
    Xesfingi, Sofia
    Kokkinidis, Damianos G.
    [J]. HEALTHCARE, 2023, 11 (21)
  • [10] Chowdhery A, 2023, J MACH LEARN RES, V24