Reasoning with large language models for medical question answering

被引:3
|
作者
Lucas, Mary M. [1 ]
Yang, Justin [2 ]
Pomeroy, Jon K. [1 ,3 ]
Yang, Christopher C. [1 ]
机构
[1] Drexel Univ, Coll Comp & Informat, 3141 Chestnut St, Philadelphia, PA 19104 USA
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[3] Penn Med, Coll Comp & Informat, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
large language model; clinical reasoning; machine reasoning; artificial intelligence;
D O I
10.1093/jamia/ocae131
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objectives To investigate approaches of reasoning with large language models (LLMs) and to propose a new prompting approach, ensemble reasoning, to improve medical question answering performance with refined reasoning and reduced inconsistency.Materials and Methods We used multiple choice questions from the USMLE Sample Exam question files on 2 closed-source commercial and 1 open-source clinical LLM to evaluate our proposed approach ensemble reasoning.Results On GPT-3.5 turbo and Med42-70B, our proposed ensemble reasoning approach outperformed zero-shot chain-of-thought with self-consistency on Steps 1, 2, and 3 questions (+3.44%, +4.00%, and +2.54%) and (2.3%, 5.00%, and 4.15%), respectively. With GPT-4 turbo, there were mixed results with ensemble reasoning again outperforming zero-shot chain-of-thought with self-consistency on Step 1 questions (+1.15%). In all cases, the results demonstrated improved consistency of responses with our approach. A qualitative analysis of the reasoning from the model demonstrated that the ensemble reasoning approach produces correct and helpful reasoning.Conclusion The proposed iterative ensemble reasoning has the potential to improve the performance of LLMs in medical question answering tasks, particularly with the less powerful LLMs like GPT-3.5 turbo and Med42-70B, which may suggest that this is a promising approach for LLMs with lower capabilities. Additionally, the findings show that our approach helps to refine the reasoning generated by the LLM and thereby improve consistency even with the more powerful GPT-4 turbo. We also identify the potential and need for human-artificial intelligence teaming to improve the reasoning beyond the limits of the model.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Tree -of-Reasoning Question Decomposition for Complex Question Answering with Large Language Models
    Zhang, Kun
    Zeng, Jiali
    Meng, Fandong
    Wang, Yuanzhuo
    Sun, Shiqi
    Bai, Long
    Shen, Huawei
    Zhou, Jie
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19560 - 19568
  • [2] MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering
    Alonso, Inigo
    Oronoz, Maite
    Agerri, Rodrigo
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 155
  • [3] A medical question answering system using large language models and knowledge graphs
    Guo, Quan
    Cao, Shuai
    Yi, Zhang
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (11) : 8548 - 8564
  • [4] Toward expert-level medical question answering with large language models
    Singhal, Karan
    Tu, Tao
    Gottweis, Juraj
    Sayres, Rory
    Wulczyn, Ellery
    Amin, Mohamed
    Hou, Le
    Clark, Kevin
    Pfohl, Stephen R.
    Cole-Lewis, Heather
    Neal, Darlene
    Rashid, Qazi Mamunur
    Schaekermann, Mike
    Wang, Amy
    Dash, Dev
    Chen, Jonathan H.
    Shah, Nigam H.
    Lachgar, Sami
    Mansfield, Philip Andrew
    Prakash, Sushant
    Green, Bradley
    Dominowska, Ewa
    Aguera y Arcas, Blaise
    Tomasev, Nenad
    Liu, Yun
    Wong, Renee
    Semturs, Christopher
    Mahdavi, S. Sara
    Barral, Joelle K.
    Webster, Dale R.
    Corrado, Greg S.
    Matias, Yossi
    Azizi, Shekoofeh
    Karthikesalingam, Alan
    Natarajan, Vivek
    NATURE MEDICINE, 2025, : 943 - 950
  • [5] Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts
    Lan, Yunshi
    Li, Xiang
    Liu, Xin
    Li, Yang
    Qin, Wei
    Qian, Weining
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4389 - 4400
  • [6] Calibrated Large Language Models for Binary Question Answering
    Giovannotti, Patrizio
    Gammerman, Alex
    13TH SYMPOSIUM ON CONFORMAL AND PROBABILISTIC PREDICTION WITH APPLICATIONS, 2024, 230 : 218 - 235
  • [7] Enhancing Biomedical Question Answering with Large Language Models
    Yang, Hua
    Li, Shilong
    Goncalves, Teresa
    INFORMATION, 2024, 15 (08)
  • [8] MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering
    Vladika, Juraj
    Schneider, Phillip
    Matthes, Florian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14459 - 14469
  • [9] An astronomical question answering dataset for evaluating large language models
    Li, Jie
    Zhao, Fuyong
    Chen, Panfeng
    Xie, Jiafu
    Zhang, Xiangrui
    Li, Hui
    Chen, Mei
    Wang, Yanhao
    Zhu, Ming
    SCIENTIFIC DATA, 2025, 12 (01)
  • [10] A General Approach to Website Question Answering with Large Language Models
    Ding, Yilang
    Nie, Jiawei
    Wu, Di
    Liu, Chang
    SOUTHEASTCON 2024, 2024, : 894 - 896