Large Language Models lack essential metacognition for reliable medical reasoning

被引:1
|
作者
Griot, Maxime [1 ,2 ]
Hemptinne, Coralie [1 ,3 ]
Vanderdonckt, Jean [2 ]
Yuksel, Demet [1 ,4 ]
机构
[1] Catholic Univ Louvain, Inst Neurosci, Brussels, Belgium
[2] Catholic Univ Louvain, Louvain Res Inst Management & Org, Louvain La Neuve, Belgium
[3] Clin Univ St Luc, Ophthalmol, Brussels, Belgium
[4] Clin Univ St Luc, Med Informat Dept, Brussels, Belgium
关键词
REFLECTIVE PRACTICE; STRATEGIES;
D O I
10.1038/s41467-024-55628-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their knowledge limitations and provided confident answers even when correct options were absent. In this work, we show that current models exhibit a critical disconnect between perceived and actual capabilities in medical reasoning, posing significant risks in clinical settings. Our findings emphasize the need for more robust evaluation frameworks that incorporate metacognitive abilities, essential for developing reliable Large Language Model enhanced clinical decision support systems.
引用
收藏
页数:10
相关论文
共 4 条
  • [1] AlloyBERT: Alloy property prediction with large language models
    Chaudhari, Akshat
    Guntuboina, Chakradhar
    Huang, Hongshuo
    Farimani, Amir Barati
    COMPUTATIONAL MATERIALS SCIENCE, 2024, 244
  • [2] IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins Using Large Language Models
    Mollaei, Parisa
    Sadasivam, Danush
    Guntuboina, Chakradhar
    Barati Farimani, Amir
    JOURNAL OF PHYSICAL CHEMISTRY B, 2024, 128 (49): : 12030 - 12037
  • [3] Investigating the capabilities of advanced large language models in generating patient instructions and patient educational material
    Sridharan, Kannan
    Sivaramakrishnan, Gowri
    EUROPEAN JOURNAL OF HOSPITAL PHARMACY, 2024,
  • [4] Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms
    Kumar, Harsh
    Xiao, Ruiwei
    Lawson, Benjamin
    Musabirov, Ilya
    Shi, Jiakai
    Wang, Xinyuan
    Luo, Huayin
    Williams, Joseph Jay
    Rafferty, Anna N.
    Stamper, John
    Liut, Michael
    PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON LEARNING@SCALE, L@S 2024, 2024, : 86 - 97