Diagnostic accuracy of large language models in psychiatry

被引:1
|
作者
Gargari, Omid Kohandel [1 ]
Fatehi, Farhad [2 ,3 ]
Mohammadi, Ida [1 ]
Firouzabadi, Shahryar Rajai [1 ]
Shafiee, Arman [1 ]
Habibi, Gholamreza [1 ]
机构
[1] Farzan Clin Res Inst, Farzan Artificial Intelligence Team, Tehran, Iran
[2] Univ Queensland, Fac Med, Ctr Hlth Serv Res, Brisbane, Australia
[3] Monash Univ, Sch Psychol Sci, Melbourne, Australia
关键词
Artificial intelligence (AI); Psychiatry; Diagnostic accuracy; Large Language Models (LLMs); DSM-5 clinical vignettes; Natural Language Processing (NLP); CARDIOVASCULAR-DISEASES; ARTIFICIAL-INTELLIGENCE; DECISION-MAKING; PREDICTION;
D O I
10.1016/j.ajp.2024.104168
中图分类号
R749 [精神病学];
学科分类号
100205 ;
摘要
Introduction: Medical decision-making is crucial for effective treatment, especially in psychiatry where diagnosis often relies on subjective patient reports and a lack of high-specificity symptoms. Artificial intelligence (AI), particularly Large Language Models (LLMs) like GPT, has emerged as a promising tool to enhance diagnostic accuracy in psychiatry. This comparative study explores the diagnostic capabilities of several AI models, including Aya, GPT-3.5, GPT-4, GPT-3.5 clinical assistant (CA), Nemotron, and Nemotron CA, using clinical cases from the DSM-5. Methods: We curated 20 clinical cases from the DSM-5 Clinical Cases book, covering a wide range of psychiatric diagnoses. Four advanced AI models (GPT-3.5 Turbo, GPT-4, Aya, Nemotron) were tested using prompts to elicit detailed diagnoses and reasoning. The models' performances were evaluated based on accuracy and quality of reasoning, with additional analysis using the Retrieval Augmented Generation (RAG) methodology for models accessing the DSM-5 text. Results: The AI models showed varied diagnostic accuracy, with GPT-3.5 and GPT-4 performing notably better than Aya and Nemotron in terms of both accuracy and reasoning quality. While models struggled with specific disorders such as cyclothymic and disruptive mood dysregulation disorders, others excelled, particularly in diagnosing psychotic and bipolar disorders. Statistical analysis highlighted significant differences in accuracy and reasoning, emphasizing the superiority of the GPT models. Discussion: The application of AI in psychiatry offers potential improvements in diagnostic accuracy. The superior performance of the GPT models can be attributed to their advanced natural language processing capabilities and extensive training on diverse text data, enabling more effective interpretation of psychiatric language. However, models like Aya and Nemotron showed limitations in reasoning, indicating a need for further refinement in their training and application. Conclusion: AI holds significant promise for enhancing psychiatric diagnostics, with certain models demonstrating high potential in interpreting complex clinical descriptions accurately. Future research should focus on expanding the dataset and integrating multimodal data to further enhance the diagnostic capabilities of AI in psychiatry.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Large language models and psychiatry
    Orru, Graziella
    Melis, Giulia
    Sartori, Giuseppe
    INTERNATIONAL JOURNAL OF LAW AND PSYCHIATRY, 2025, 101
  • [2] Applications of large language models in psychiatry: a systematic review
    Omar, Mahmud
    Soffer, Shelly
    Charney, Alexander W.
    Landi, Isotta
    Nadkarni, Girish N.
    Klang, Eyal
    FRONTIERS IN PSYCHIATRY, 2024, 15
  • [3] The Comparative Diagnostic Capability of Large Language Models in Otolaryngology
    Warrier, Akshay
    Singh, Rohan
    Haleem, Afash
    Zaki, Haider
    Eloy, Jean Anderson
    LARYNGOSCOPE, 2024, 134 (09): : 3997 - 4002
  • [4] The Triage and Diagnostic Accuracy of Frontier Large Language Models: Updated Comparison to Physician Performance
    Sorich, Michael Joseph
    Mangoni, Arduino Aleksander
    Bacchi, Stephen
    Menz, Bradley Douglas
    Hopkins, Ashley Mark
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [5] Performance and Accuracy Research of the Large Language Models
    Gaitan, Nicoleta Cristina
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (08) : 62 - 69
  • [6] EVALUATING LARGE LANGUAGE MODELS ON THEIR ACCURACY AND COMPLETENESS
    Edalat, Camellia
    Kirupaharan, Nila
    Dalvin, Lauren A.
    Mishra, Kapil
    Marshall, Rayna
    Xu, Hannah
    Francis, Jasmine H.
    Berkenstock, Meghan
    RETINA-THE JOURNAL OF RETINAL AND VITREOUS DISEASES, 2025, 45 (01): : 128 - 132
  • [7] Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures
    Rewthamrongsris, Paak
    Burapacheep, Jirayu
    Trachoo, Vorapat
    Porntaveetus, Thantrira
    INTERNATIONAL DENTAL JOURNAL, 2025, 75 (01) : 206 - 212
  • [8] Evaluating the Accuracy of Responses by Large Language Models for Information on Disease Epidemiology
    Zhu, Kexin
    Zhang, Jiajie
    Klishin, Anton
    Esser, Mario
    Blumentals, William A.
    Juhaeri, Juhaeri
    Jouquelet-Royer, Corinne
    Sinnott, Sarah-Jo
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2025, 34 (02)
  • [9] From Large Language Models to Large Multimodal Models: A Literature Review
    Huang, Dawei
    Yan, Chuan
    Li, Qing
    Peng, Xiaojiang
    APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [10] A comparison of the diagnostic ability of large language models in challenging clinical cases
    Khan, Maria Palwasha
    O'Sullivan, Eoin Daniel
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7