Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use

被引:0
作者
Belkis Nihan Coskun
Burcu Yagiz
Gokhan Ocakoglu
Ediz Dalkilic
Yavuz Pehlivan
机构
[1] Bursa Uludag University,Division of Rheumatology, Department of Internal Medicine, Faculty of Medicine
[2] Bursa Uludag University,Department of Biostatistics, Faculty of Medicine
来源
Rheumatology International | 2024年 / 44卷
关键词
Accuracy; Artificial intelligence; Completeness; Large language models; Methotrexate;
D O I
暂无
中图分类号
学科分类号
摘要
We aimed to assess Large Language Models (LLMs)—ChatGPT 3.5–4, BARD, and Bing—in their accuracy and completeness when answering Methotrexate (MTX) related questions for treating rheumatoid arthritis. We employed 23 questions from an earlier study related to MTX concerns. These questions were entered into the LLMs, and the responses generated by each model were evaluated by two reviewers using Likert scales to assess accuracy and completeness. The GPT models achieved a 100% correct answer rate, while BARD and Bing scored 73.91%. In terms of accuracy of the outputs (completely correct responses), GPT-4 achieved a score of 100%, GPT 3.5 secured 86.96%, and BARD and Bing each scored 60.87%. BARD produced 17.39% incorrect responses and 8.7% non-responses, while Bing recorded 13.04% incorrect and 13.04% non-responses. The ChatGPT models produced significantly more accurate responses than Bing for the “mechanism of action” category, and GPT-4 model showed significantly higher accuracy than BARD in the “side effects” category. There were no statistically significant differences among the models for the “lifestyle” category. GPT-4 achieved a comprehensive output of 100%, followed by GPT-3.5 at 86.96%, BARD at 60.86%, and Bing at 0%. In the “mechanism of action” category, both ChatGPT models and BARD produced significantly more comprehensive outputs than Bing. For the “side effects” and “lifestyle” categories, the ChatGPT models showed significantly higher completeness than Bing. The GPT models, particularly GPT 4, demonstrated superior performance in providing accurate and comprehensive patient information about MTX use. However, the study also identified inaccuracies and shortcomings in the generated responses.
引用
收藏
页码:509 / 515
页数:6
相关论文
共 61 条
[1]  
Cronstein BN(1997)The mechanism of action of methotrexate Rheumatic Dis Clin N Am 23 739-755
[2]  
Brown PM(2016)Mechanism of action of methotrexate in rheumatoid arthritis, and the search for biomarkers Nat Rev Rheumatol 12 731-742
[3]  
Pratt AG(2019)Methotrexate an old drug with new tricks Int J Mol Sci 20 5023-516
[4]  
Isaacs JD(2022)Treatment of early rheumatoid arthritis: Methotrexate and beyond Curr Opin Pharmacol 64 502-563
[5]  
Bedoui Y(2018)Side effects of methotrexate therapy for rheumatoid arthritis: a systematic review Eur J Med Chem 158 551-1421
[6]  
Guillot X(1999)The effects of methotrexate on pregnancy, fertility and lactation QJM 92 1416-3220
[7]  
Sélambarom J(2023)Methotrexate in pregnancy: still many unanswered questions RMD Open 9 3215-689
[8]  
García-González CM(2010)Methotrexate drug interactions in the treatment of rheumatoid arthritis: a systematic review J Rheumatol 37 682-1891
[9]  
Baker J(2018)Safety of the concomitant use of methotrexate and a prophylactic dose of trimethoprim-sulfamethoxazole Clin Rheumatol 37 1884-1380
[10]  
Wang W(2016)The development of a questionnaire to evaluate rheumatoid arthritis patient’s knowledge about methotrexate J Clin Nurs 25 887-1349