Development and Evaluation of a GPT4-Based Orofacial Pain Clinical Decision Support System

被引:0
作者
Vueghs, Charlotte [1 ]
Shakeri, Hamid [1 ]
Renton, Tara [2 ]
van der Cruyssen, Frederic [1 ,3 ]
机构
[1] Univ Hosp Leuven, Dept Oral & Maxillofacial Surg, B-3000 Leuven, Belgium
[2] Kings Coll London Dent Inst, Dept Oral Surg, London SE5 9RW, England
[3] Katholieke Univ Leuven, OMFS IMPATH Res Grp, B-3000 Leuven, Belgium
关键词
validation; development; large language model; GPT4; clinical decision support system; DIAGNOSTIC ERRORS; FACIAL-PAIN;
D O I
10.3390/diagnostics14242835
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: Orofacial pain (OFP) encompasses a complex array of conditions affecting the face, mouth, and jaws, often leading to significant diagnostic challenges and high rates of misdiagnosis. Artificial intelligence, particularly large language models like GPT4 (OpenAI, San Francisco, CA, USA), offers potential as a diagnostic aid in healthcare settings. Objective: To evaluate the diagnostic accuracy of GPT4 in OFP cases as a clinical decision support system (CDSS) and compare its performance against treating clinicians, expert evaluators, medical students, and general practitioners. Methods: A total of 100 anonymized patient case descriptions involving diverse OFP conditions were collected. GPT4 was prompted to generate primary and differential diagnoses for each case using the International Classification of Orofacial Pain (ICOP) criteria. Diagnoses were compared to gold-standard diagnoses established by treating clinicians, and a scoring system was used to assess accuracy at three hierarchical ICOP levels. A subset of 24 cases was also evaluated by two clinical experts, two final-year medical students, and two general practitioners for comparative analysis. Diagnostic performance and interrater reliability were calculated. Results: GPT4 achieved the highest accuracy level (ICOP level 3) in 38% of cases, with an overall diagnostic performance score of 157 out of 300 points (52%). The model provided accurate differential diagnoses in 80% of cases (400 out of 500 points). In the subset of 24 cases, the model's performance was comparable to non-expert human evaluators but was surpassed by clinical experts, who correctly diagnosed 54% of cases at level 3. GPT4 demonstrated high accuracy in specific categories, correctly diagnosing 81% of trigeminal neuralgia cases at level 3. Interrater reliability between GPT4 and human evaluators was low (kappa = 0.219, p < 0.001), indicating variability in diagnostic agreement. Conclusions: GPT4 shows promise as a CDSS for OFP by improving diagnostic accuracy and offering structured differential diagnoses. While not yet outperforming expert clinicians, GPT4 can augment diagnostic workflows, particularly in primary care or educational settings. Effective integration into clinical practice requires adherence to rigorous guidelines, thorough validation, and ongoing professional oversight to ensure patient safety and diagnostic reliability.
引用
收藏
页数:16
相关论文
共 48 条
[1]  
Achiam J., 2024, Gpt-4 technical report
[2]   Burning mouth syndrome: Analysis of diagnostic delay in 500 patients [J].
Adamo, Daniela ;
Calabria, Elena ;
Canfora, Federica ;
Coppola, Noemi ;
Pecoraro, Giuseppe ;
D'Aniello, Luca ;
Aria, Massimo ;
Mignogna, Michele Davide ;
Leuci, Stefania .
ORAL DISEASES, 2024, 30 (03) :1543-1554
[3]   Unexplained orofacial pain - Is an early diagnosis possible? [J].
Aggarwal V.R. ;
McBeth J. ;
Zakrzewska J.M. ;
MacFarlane G.J. .
British Dental Journal, 2008, 205 (3) :E6-E6
[4]   Artificial Intelligence for Medical Diagnostics-Existing and Future AI Technology! [J].
Al-Antari, Mugahed A. .
DIAGNOSTICS, 2023, 13 (04)
[5]   Use of the out-of-hours emergency dental service at two south-east London hospitals [J].
Austin R. ;
Jones K. ;
Wright D. ;
Donaldson N. ;
Gallagher J.E. .
BMC Oral Health, 9 (1)
[6]   The international classification of headache disorders: Accurate diagnosis of orofacial pain? [J].
Benoliel, R. ;
Birman, N. ;
Eliav, E. ;
Sharav, Y. .
CEPHALALGIA, 2008, 28 (07) :752-762
[7]   International Classification of Orofacial Pain, 1st edition (ICOP) [J].
Benoliel, Rafael ;
May, Arne ;
Svensson, Peter ;
Pigg, Maria ;
Law, Alan ;
Nixdorf, Donald ;
Renton, Tara ;
Sharav, Yair ;
Ernberg, Malin ;
Peck, Chris ;
Alstergren, Per ;
Kaspo, Ghabi ;
Lobbezoo, Frank ;
Michelotti, Ambra ;
Baad-Hansen, Lene ;
Eliav, Eli ;
Imamura, Yoshiki ;
Conti, Paulo ;
List, Thomas ;
Durham, Justin ;
Goulet, Jean-Paul ;
Jaaskelainen, Satu ;
Ohrbach, Richard .
CEPHALALGIA, 2020, 40 (02) :129-221
[8]   The Psychological Evaluation of Patients with Chronic Pain: a Review of BHI 2 Clinical and Forensic Interpretive Considerations [J].
Bruns D. ;
Disorbio J.M. .
Psychological Injury and Law, 2014, 7 (4) :335-361
[9]  
Cascella Marco, 2024, J Pers Med, V14, DOI 10.3390/jpm14090983
[10]  
Council E.P., 2024, Off. J. Eur. Union L Ser, V50, P202