Large Language Models lack essential metacognition for reliable medical reasoning

被引：1

作者：

Griot, Maxime ^{[1
,2
]}

Hemptinne, Coralie ^{[1
,3
]}

Vanderdonckt, Jean ^{[2
]}

Yuksel, Demet ^{[1
,4
]}

机构：

[1] Catholic Univ Louvain, Inst Neurosci, Brussels, Belgium

[2] Catholic Univ Louvain, Louvain Res Inst Management & Org, Louvain La Neuve, Belgium

[3] Clin Univ St Luc, Ophthalmol, Brussels, Belgium

[4] Clin Univ St Luc, Med Informat Dept, Brussels, Belgium

来源：

NATURE COMMUNICATIONS | 2025年 / 16卷 / 01期

关键词：

REFLECTIVE PRACTICE; STRATEGIES;

D O I：

10.1038/s41467-024-55628-6

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their knowledge limitations and provided confident answers even when correct options were absent. In this work, we show that current models exhibit a critical disconnect between perceived and actual capabilities in medical reasoning, posing significant risks in clinical settings. Our findings emphasize the need for more robust evaluation frameworks that incorporate metacognitive abilities, essential for developing reliable Large Language Model enhanced clinical decision support systems.

引用

页数：10

共 4 条

[1] AlloyBERT: Alloy property prediction with large language models
Chaudhari, Akshat
Guntuboina, Chakradhar
Huang, Hongshuo
Farimani, Amir Barati
COMPUTATIONAL MATERIALS SCIENCE, 2024, 244
[2] IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins Using Large Language Models
Mollaei, Parisa
Sadasivam, Danush
Guntuboina, Chakradhar
Barati Farimani, Amir
JOURNAL OF PHYSICAL CHEMISTRY B, 2024, 128 (49): : 12030 - 12037
[3] Investigating the capabilities of advanced large language models in generating patient instructions and patient educational material
Sridharan, Kannan
Sivaramakrishnan, Gowri
EUROPEAN JOURNAL OF HOSPITAL PHARMACY, 2024,
[4] Supporting Self-Reflection at Scale with Large Language Models: Insights from Randomized Field Experiments in Classrooms
Kumar, Harsh
Xiao, Ruiwei
Lawson, Benjamin
Musabirov, Ilya
Shi, Jiakai
Wang, Xinyuan
Luo, Huayin
Williams, Joseph Jay
Rafferty, Anna N.
Stamper, John
Liut, Michael
PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON LEARNING@SCALE, L@S 2024, 2024, : 86 - 97

← 1 →