Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability Framework for Safe and Effective Large Language Models in Medical Education: Narrative Review and Qualitative Study

被引：1

作者：

Quttainah, Majdi ^{[1
]}

Mishra, Vinaytosh ^{[2
]}

Madakam, Somayya ^{[3
]}

Lurie, Yotam ^{[4
]}

Mark, Shlomo ^{[5
]}

机构：

[1] Kuwait Univ, Coll Business Adm, Kuwait, Kuwait

[2] Gulf Med Univ, Coll Healthcare Management & Econ, Al Jurf 1, Ajman 4184, U Arab Emirates

[3] Birla Inst Management Technol, Informat Technol, Knowledge Pk-II, Greater Noida, India

[4] Bengurion Univ Negev, Dept Management, Negev, Israel

[5] Shamoon Coll Engn, Dept Software Engn, Ashdod, Israel

来源：

JMIR AI | 2024年 / 3卷

关键词：

large language model; LLM; ChatGPT; CUC-FATE framework; cost; usability; credibility; fairness; accountability; transparency; and explainability; analytical hierarchy process; AHP; total interpretive structural modeling; TISM; medical education; adoption; guideline; development; health care; chat generative pretrained transformer; generative language model tool; user; innovation; data generation; narrative review; health care professional;

D O I：

10.2196/51834

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Background: The world has witnessed increased adoption of large language models (LLMs) in the last year. Although the products developed using LLMs have the potentialto solve accessibility and efficiency problems in health care, there is a lack of available guidelines for developing LLMs for health care, especially for medical education. Objective: The aim of this study was to identify and prioritizethe enablers for developing successful LLMs for medical education. We further evaluated the relationships among these identified enablers. Methods: A narrative review of the extant literature was first performed to identify the key enablers for LLM development. We additionally gathered the opinions of LLM users to determine the relative importance of these enablers using an analytical hierarchy process (AHP), which is a multicriteria decision-making method. Further, total interpretive structural modeling (TISM) was used to analyze the perspectives of product developers and ascertain the relationships and hierarchy among these enablers. Finally, the cross-impact matrix-based multiplication applied to a classification (MICMAC) approach was used to determine the relative driving and dependence powers of these enablers. A nonprobabilistic purposive sampling approach was used for recruitment of focus groups. Results: The AHP demonstrated that the most important enabler for LLMs was credibility, with a priority weight of 0.37, followed by accountability (0.27642) and fairness (0.10572). In contrast, usability, with a priority weight of 0.04, showed negligible importance. The results of TISM concurred with thefindings of theAHP. The only striking difference between expert perspectives and user preference evaluation was that the product developers indicated that cost has the least importance as a potential enabler. The MICMAC analysis suggested that cost has a strong influence on other enablers. The inputs of the focus group were found to be reliable, with a consistency ratio less than 0.1 (0.084). Conclusions:This study is the first to identify, prioritize, and analyzethe relationships of enablers of effective LLMs for medical education. Based on the results of this study, we developed a comprehendible prescriptive framework, named CUC-FATE (Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability), for evaluating the enablers of LLMs in medical education. The study findings are useful for health care professionals, healthtechnology experts, medical technology regulators, and policy makers.

引用

页数：14