ChatGPT and Bard exhibit spontaneous citation fabrication during psychiatry literature search

被引:69
作者
McGowan, Alessia [1 ]
Gui, Yunlai [1 ]
Dobbs, Matthew [1 ]
Shuster, Sophia [1 ]
Cotter, Matthew [1 ]
Selloni, Alexandria [1 ]
Goodman, Marianne [1 ,2 ]
Srivastava, Agrima [1 ]
Cecchi, Guillermo A. [3 ]
Corcoran, Cheryl M. [1 ,2 ]
机构
[1] Icahn Sch Med Mt Sinai, New York, NY 10029 USA
[2] James J Peters Vet Adm, Bronx, NY 10468 USA
[3] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
关键词
Natural language processing; Linguistic; Large language models; Literature search; Citations; References; ChatGPT; Bard; Fabrication; Artificial intelligence; PSYCHOSIS; LANGUAGE;
D O I
10.1016/j.psychres.2023.115334
中图分类号
R749 [精神病学];
学科分类号
100205 ;
摘要
ChatGPT (Generative Pre-Trained Transformer) is a large language model (LLM), which comprises a neural network that has learned information and patterns of language use from large amounts of text on the internet. ChatGPT, introduced by OpenAI, responds to human queries in a conversational manner. Here, we aimed to assess whether ChatGPT could reliably produce accurate references to supplement the literature search process. We describe our March 2023 exchange with ChatGPT, which generated thirty-five citations, two of which were real. 12 citations were similar to actual manuscripts (e.g., titles with incorrect author lists, journals, or publication years) and the remaining 21, while plausible, were in fact a pastiche of multiple existent manuscripts. In June 2023, we re-tested ChatGPT's performance and compared it to that of Google's GPT counterpart, Bard 2.0. We investigated performance in English, as well as in Spanish and Italian. Fabrications made by LLMs, including erroneous citations, have been called "hallucinations"; we discuss reasons for which this is a misnomer. Furthermore, we describe potential explanations for citation fabrication by GPTs, as well as measures being taken to remedy this issue, including reinforcement learning. Our results underscore that output from conversational LLMs should be verified.
引用
收藏
页数:6
相关论文
共 25 条
[1]   Artificial Hallucinations in ChatGPT: Implications in Scientific Writing [J].
Alkaissi, Hussam ;
McFarlane, Samy I. .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (02)
[2]   Construct validity for computational linguistic metrics in individuals at clinical risk for psychosis: Associations with clinical ratings [J].
Bilgrami, Zarina R. ;
Sarac, Cansu ;
Srivastava, Agrima ;
Herrera, Shaynna N. ;
Azis, Matilda ;
Haas, Shalaila S. ;
Shaik, Riaz B. ;
Parvaz, Muhammad A. ;
Mittal, Vijay A. ;
Cecchi, Guillermo ;
Corcoran, Cheryl M. .
SCHIZOPHRENIA RESEARCH, 2022, 245 :90-96
[3]   Language as a biomarker for psychosis: A natural language processing approach [J].
Corcoran, Cheryl M. ;
Mittal, Vijay A. ;
Bearden, Carrie E. ;
Gur, Raquel E. ;
Hitczenko, Kasia ;
Bilgrami, Zarina ;
Savic, Aleksandar ;
Cecchi, Guillermo A. ;
Wolff, Phillip .
SCHIZOPHRENIA RESEARCH, 2020, 226 :158-166
[4]   Prediction of psychosis across protocols and risk cohorts using automated language analysis [J].
Corcoran, Cheryl M. ;
Carrillo, Facundo ;
Fernandez-Slezak, Diego ;
Bedi, Gillinder ;
Klim, Casimir ;
Javitt, Daniel C. ;
Bearden, Carrie E. ;
Cecchi, Guillermo A. .
WORLD PSYCHIATRY, 2018, 17 (01) :67-75
[5]   A Preliminary Investigation of Fake Peer-Reviewed Citations and References Generated by ChatGPT [J].
Day, Terence .
PROFESSIONAL GEOGRAPHER, 2023, 75 (06) :1024-1027
[6]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arxiv.1810.04805]
[7]   An Electromyographic Evaluation of Subdividing Active-Assistive Shoulder Elevation Exercises [J].
Gaunt, Bryce W. ;
McCluskey, George M. ;
Uhl, Tim L. .
SPORTS HEALTH-A MULTIDISCIPLINARY APPROACH, 2010, 2 (05) :424-432
[8]  
Google AI, 2023, BARD 2 0
[9]  
Heaven W. D., 2023, MIT TECHNOL REV
[10]   An introduction to latent semantic analysis [J].
Landauer, TK ;
Foltz, PW ;
Laham, D .
DISCOURSE PROCESSES, 1998, 25 (2-3) :259-284