Diagnostic accuracy of large language models in psychiatry

被引:1
|
作者
Gargari, Omid Kohandel [1 ]
Fatehi, Farhad [2 ,3 ]
Mohammadi, Ida [1 ]
Firouzabadi, Shahryar Rajai [1 ]
Shafiee, Arman [1 ]
Habibi, Gholamreza [1 ]
机构
[1] Farzan Clin Res Inst, Farzan Artificial Intelligence Team, Tehran, Iran
[2] Univ Queensland, Fac Med, Ctr Hlth Serv Res, Brisbane, Australia
[3] Monash Univ, Sch Psychol Sci, Melbourne, Australia
关键词
Artificial intelligence (AI); Psychiatry; Diagnostic accuracy; Large Language Models (LLMs); DSM-5 clinical vignettes; Natural Language Processing (NLP); CARDIOVASCULAR-DISEASES; ARTIFICIAL-INTELLIGENCE; DECISION-MAKING; PREDICTION;
D O I
10.1016/j.ajp.2024.104168
中图分类号
R749 [精神病学];
学科分类号
100205 ;
摘要
Introduction: Medical decision-making is crucial for effective treatment, especially in psychiatry where diagnosis often relies on subjective patient reports and a lack of high-specificity symptoms. Artificial intelligence (AI), particularly Large Language Models (LLMs) like GPT, has emerged as a promising tool to enhance diagnostic accuracy in psychiatry. This comparative study explores the diagnostic capabilities of several AI models, including Aya, GPT-3.5, GPT-4, GPT-3.5 clinical assistant (CA), Nemotron, and Nemotron CA, using clinical cases from the DSM-5. Methods: We curated 20 clinical cases from the DSM-5 Clinical Cases book, covering a wide range of psychiatric diagnoses. Four advanced AI models (GPT-3.5 Turbo, GPT-4, Aya, Nemotron) were tested using prompts to elicit detailed diagnoses and reasoning. The models' performances were evaluated based on accuracy and quality of reasoning, with additional analysis using the Retrieval Augmented Generation (RAG) methodology for models accessing the DSM-5 text. Results: The AI models showed varied diagnostic accuracy, with GPT-3.5 and GPT-4 performing notably better than Aya and Nemotron in terms of both accuracy and reasoning quality. While models struggled with specific disorders such as cyclothymic and disruptive mood dysregulation disorders, others excelled, particularly in diagnosing psychotic and bipolar disorders. Statistical analysis highlighted significant differences in accuracy and reasoning, emphasizing the superiority of the GPT models. Discussion: The application of AI in psychiatry offers potential improvements in diagnostic accuracy. The superior performance of the GPT models can be attributed to their advanced natural language processing capabilities and extensive training on diverse text data, enabling more effective interpretation of psychiatric language. However, models like Aya and Nemotron showed limitations in reasoning, indicating a need for further refinement in their training and application. Conclusion: AI holds significant promise for enhancing psychiatric diagnostics, with certain models demonstrating high potential in interpreting complex clinical descriptions accurately. Future research should focus on expanding the dataset and integrating multimodal data to further enhance the diagnostic capabilities of AI in psychiatry.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Large Language Models in Orthopaedics
    Yao, Jie J.
    Aggarwal, Manan
    Lopez, Ryan D.
    Namdari, Surena
    JOURNAL OF BONE AND JOINT SURGERY-AMERICAN VOLUME, 2024, 106 (15): : 1411 - 1418
  • [22] Large language models in science
    Kowalewski, Karl-Friedrich
    Rodler, Severin
    UROLOGIE, 2024, 63 (09): : 860 - 866
  • [23] Diagnostic Accuracy of Language Tests and Parent Rating for Identifying Language Disorders
    Tippelt, S.
    Kuehn, P.
    Grossheinrich, N.
    von Suchodoletz, W.
    LARYNGO-RHINO-OTOLOGIE, 2011, 90 (07) : 421 - 427
  • [24] Comparative Analysis of the Accuracy of Large Language Models in Addressing Common Pulmonary Embolism Patient Questions
    Rosenzveig, Akiva
    Kassab, Joseph
    Sul, Lidiya
    Angelini, Dana
    Chaudhury, Pulkit
    Sarraju, Ashish
    Tefera, Leben
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2024, 13 (21):
  • [25] A review of large language models and autonomous agents in chemistry
    Ramos, Mayk Caldas
    Collison, Christopher J.
    White, Andrew D.
    CHEMICAL SCIENCE, 2025, 16 (06) : 2514 - 2572
  • [26] Exploring Variability in Risk Taking With Large Language Models
    Bhatia, Sudeep
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2024, 153 (07) : 1838 - 1860
  • [27] Einsatzmöglichkeiten von „large language models“ in der OnkologieApplications of large language models in oncology
    Chiara M. Loeffler
    Keno K. Bressem
    Daniel Truhn
    Die Onkologie, 2024, 30 (5) : 388 - 393
  • [28] A comprehensive survey of large language models and multimodal large models in medicine
    Xiao, Hanguang
    Zhou, Feizhong
    Liu, Xingyue
    Liu, Tianqi
    Li, Zhipeng
    Liu, Xin
    Huang, Xiaoxuan
    INFORMATION FUSION, 2025, 117
  • [29] Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models
    Chen, Zheyi
    Xu, Liuchang
    Zheng, Hongting
    Chen, Luyao
    Tolba, Amr
    Zhao, Liang
    Yu, Keping
    Feng, Hailin
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (02): : 1753 - 1808
  • [30] Large language models' responses to liver cancer surveillance, diagnosis, and management questions: accuracy, reliability, readability
    Cao, Jennie J.
    Kwon, Daniel H.
    Ghaziani, Tara T.
    Kwo, Paul
    Tse, Gary
    Kesselman, Andrew
    Kamaya, Aya
    Tse, Justin R.
    ABDOMINAL RADIOLOGY, 2024, 49 (12) : 4286 - 4294