AI chatbots not yet ready for clinical use

被引：81

作者：

Au Yeung, Joshua ^{[1
,2
]}

Kraljevic, Zeljko ^{[3
]}

Luintel, Akish ^{[1
]}

Balston, Alfred ^{[2
]}

Idowu, Esther ^{[2
]}

Dobson, Richard J. J. ^{[3
,4
,5
]}

Teo, James T. T. ^{[1
,2
]}

机构：

[1] Kings Coll Hosp London, Dept Neurosci, London, England

[2] Guys & St Thomas Hosp, London, England

[3] Kings Coll London, Dept Biostat, London, England

[4] South London & Maudsley NHS Fdn Trust, NIHR Biomed Res Ctr, London, England

[5] Kings Coll London, London, England

来源：

FRONTIERS IN DIGITAL HEALTH | 2023年 / 5卷

关键词：

large language models; chatbot; natural language processing (computer science); digital health; AI safety; transformer;

D O I：

10.3389/fdgth.2023.1161098

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

As large language models (LLMs) expand and become more advanced, so do the natural language processing capabilities of conversational AI, or "chatbots". OpenAI's recent release, ChatGPT, uses a transformer-based model to enable human-like text generation and question-answering on general domain knowledge, while a healthcare-specific Large Language Model (LLM) such as GatorTron has focused on the real-world healthcare domain knowledge. As LLMs advance to achieve near human-level performances on medical question and answering benchmarks, it is probable that Conversational AI will soon be developed for use in healthcare. In this article we discuss the potential and compare the performance of two different approaches to generative pretrained transformers-ChatGPT, the most widely used general conversational LLM, and Foresight, a GPT (generative pretrained transformer) based model focused on modelling patients and disorders. The comparison is conducted on the task of forecasting relevant diagnoses based on clinical vignettes. We also discuss important considerations and limitations of transformer-based chatbots for clinical use.

引用

页数：5

共 23 条

[1]

[Anonymous], ChatGPT: Optimizing Language Models for Dialogue

[2] A Comparison of Artificial Intelligence and Human Doctors for the Purpose of Triage and Diagnosis [J].

Baker, Adam ;

Perov, Yura ;

Middleton, Katherine ;

Baxter, Janie ;

Mullarkey, Daniel ;

Sangar, Davinder ;

Butt, Mobasher ;

DoRosario, Arnold ;

Johri, Saurabh .

FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2020, 3

[3] Benchmark datasets driving artificial intelligence development fail to capture the needs of medical professionals [J].

Blagec, Kathrin ;

Kraiger, Jakob ;

Fruehwirt, Wolfgang ;

Samwald, Matthias .

JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 137

[4]

Brown TB, 2020, ADV NEUR IN, V33

[5] Semantics derived automatically from language corpora contain human-like biases [J].

Caliskan, Aylin ;

Bryson, Joanna J. ;

Narayanan, Arvind .

SCIENCE, 2017, 356 (6334) :183-186

[6]

Chowdhery A, 2022, Arxiv, DOI [arXiv:2204.02311, 10.48550/arXiv.2204.02311, DOI 10.48550/ARXIV.2204.02311]

[7]

Hoffmann J, 2022, ADV NEUR IN

[8]

Huang PS, 2020, Arxiv, DOI arXiv:1911.03064

[9] What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams [J].

Jin, Di ;

Pan, Eileen ;

Oufattole, Nassim ;

Weng, Wei-Hung ;

Fang, Hanyi ;

Szolovits, Peter .

APPLIED SCIENCES-BASEL, 2021, 11 (14)

[10]

Jin Q, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P2567

← 1 2 3 →