Integrating AI into clinical education: evaluating general practice trainees' proficiency in distinguishing AI-generated hallucinations and impacting factors

被引:0
作者
Zhou, Jiacheng [1 ,2 ]
Zhang, Jintao [1 ,2 ]
Wan, Rongrong [1 ,2 ]
Cui, Xiaochuan [1 ,2 ]
Liu, Qiyu [1 ,2 ]
Guo, Hua [1 ,2 ]
Shi, Xiaofen [1 ,2 ]
Fu, Bingbing [3 ]
Meng, Jia [4 ]
Yue, Bo [5 ]
Zhang, Yunyun [1 ,2 ,3 ,6 ]
Zhang, Zhiyong [1 ,2 ,3 ,6 ]
机构
[1] Nanjing Med Univ, Affiliated Wuxi Peoples Hosp, Dept Gen Practice, Wuxi, Jiangsu, Peoples R China
[2] Nanjing Med Univ, Wuxi Peoples Hosp, Wuxi Med Ctr, Wuxi, Jiangsu, Peoples R China
[3] Jiamusi Univ, Affiliated Hosp 1, Dept Postgrad Educ, Heilongjiang, Peoples R China
[4] Harbin Med Univ, Affiliated Hosp 2, Dept Gen Practice, Heilongjiang, Peoples R China
[5] Qiqihar Med Univ, Affiliated Hosp 2, Residency Training Ctr, Heilongjiang, Peoples R China
[6] Nanjing Med Univ, Wuxi Peoples Hosp, Affiliated Wuxi Peoples Hosp, Wuxi Med Ctr,Educ Dept, Qingyang Rd 299, Wuxi, Peoples R China
关键词
ChatGPT-4o generated hallucinations; General practice (GP) trainees; General practice specialist training; Response bias; ARTIFICIAL-INTELLIGENCE; CHATGPT;
D O I
10.1186/s12909-025-06916-2
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
ObjectiveTo assess the ability of General Practice (GP) Trainees to detect AI-generated hallucinations in simulated clinical practice, ChatGPT-4o was utilized. The hallucinations were categorized into three types based on the accuracy of the answers and explanations: (1) correct answers with incorrect or flawed explanations, (2) incorrect answers with explanations that contradict factual evidence, and (3) incorrect answers with correct explanations.MethodsThis multi-center, cross-sectional survey study involved 142 GP Trainees, all of whom were undergoing General Practice Specialist Training and volunteered to participate. The study evaluated the accuracy and consistency of ChatGPT-4o, as well as the Trainees' response time, accuracy, sensitivity (d'), and response tendencies (beta). Binary regression analysis was used to explore factors affecting the Trainees' ability to identify errors generated by ChatGPT-4o.ResultsA total of 137 participants were included, with a mean age of 25.93 years. Half of the participants were unfamiliar with AI, and 35.0% had never used it. ChatGPT-4o's overall accuracy was 80.8%, which slightly decreased to 80.1% after human verification. However, the accuracy for professional practice (Subject 4) was only 57.0%, and after human verification, it dropped further to 44.2%. A total of 87 AI-generated hallucinations were identified, primarily occurring at the application and evaluation levels. The mean accuracy of detecting these hallucinations was 55.0%, and the mean sensitivity (d') was 0.39. Regression analysis revealed that shorter response times (OR = 0.92, P = 0.02), higher self-assessed AI understanding (OR = 0.16, P = 0.04), and more frequent AI use (OR = 10.43, P = 0.01) were associated with stricter error detection criteria.ConclusionsThe study concluded that GP trainees faced challenges in identifying ChatGPT-4o's errors, particularly in clinical scenarios. This highlights the importance of improving AI literacy and critical thinking skills to ensure effective integration of AI into medical education.
引用
收藏
页数:9
相关论文
共 45 条
[1]   Reference Hallucination Score for Medical Artificial IntelligenceChatbots:Development and Usability Study [J].
Aljamaan, Fadi ;
Temsah, Mohamad-Hani ;
Altamimi, Ibraheem ;
Al-Eyadhy, Ayman ;
Jamal, Amr ;
Alhasan, Khalid ;
Mesallam, Tamer A. ;
Farahat, Mohamed ;
Malki, Khalid H. .
JMIR MEDICAL INFORMATICS, 2024, 12
[2]  
Aujla H, 2023, BEHAV RES METHODS, V55, P2532, DOI 10.3758/s13428-022-01913-5
[3]   ChatGPT and Generative Artificial Intelligence for Medical Education: Potential Impact and Opportunity [J].
Boscardin, Christy K. ;
Gin, Brian ;
Golde, Polo Black ;
Hauer, Karen E. .
ACADEMIC MEDICINE, 2024, 99 (01) :22-27
[4]   General Practitioners' Attitudes Toward Artificial Intelligence-Enabled Systems: Interview Study [J].
Buck, Christoph ;
Doctor, Eileen ;
Hennrich, Jasmin ;
Johnk, Jan ;
Eymann, Torsten .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (01)
[5]   Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard [J].
Cheong, Ryan Chin Taw ;
Pang, Kenny Peter ;
Unadkat, Samit ;
Mcneillis, Venkata ;
Williamson, Andrew ;
Joseph, Jonathan ;
Randhawa, Premjit ;
Andrews, Peter ;
Paleri, Vinidh .
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (04) :2137-2143
[6]   Primary Care Physicians' Satisfaction With Interoperable Health Information Technology [J].
Everson, Jordan ;
Hendrix, Nathaniel ;
Phillips, Robert L. ;
Adler-Milstein, Julia ;
Bazemore, Andrew ;
Patel, Vaishali .
JAMA NETWORK OPEN, 2024, 7 (03)
[7]   Shaping future practices: German-speaking medical and dental students' perceptions of artificial intelligence in healthcare [J].
Fitzek, Sebastian ;
Choi, Kyung-Eun Anna .
BMC MEDICAL EDUCATION, 2024, 24 (01)
[8]   Hallucinations in ChatGPT: A Cautionary Tale for Biomedical Researchers [J].
Goddard, Jerome .
AMERICAN JOURNAL OF MEDICINE, 2023, 136 (11) :1059-1060
[9]  
Gruda Dritjon, 2024, Nature
[10]   Comparison of electronic versus conventional assessment methods in ophthalmology residents; a learner assessment scholarship study [J].
Hasani, Hamidreza ;
Khoshnoodifar, Mehrnoosh ;
Khavandegar, Armin ;
Ahmadi, Soleyman ;
Alijani, Saba ;
Mobedi, Aidin ;
Tarani, Shaghayegh ;
Vafadar, Benyamin ;
Tajbakhsh, Ramin ;
Rezaei, Mehdi ;
Parvari, Soraya ;
Shamsoddini, Sara ;
Silbert, David I. .
BMC MEDICAL EDUCATION, 2021, 21 (01)