Reviewer Experience Detecting and Judging Human Versus Artificial Intelligence Content: The Stroke Journal Essay Contest

被引:4
作者
Silva, Gisele S. [1 ,2 ]
Khera, Rohan [3 ,4 ,6 ]
Schwamm, Lee H. [4 ,5 ,7 ]
机构
[1] Univ Fed Sao Paulo, Hosp Israelita Albert Einstein, Sao Paulo, Brazil
[2] Univ Fed Sao Paulo, Dept Neurol & Neurocirurgia, Sao Paulo, Brazil
[3] Yale Sch Med, Sect Cardiovasc Med, New Haven, CT USA
[4] Yale Sch Med, Biomed Informat & Data Sci, New Haven, CT USA
[5] Yale Sch Med, Dept Neurol, New Haven, CT USA
[6] Yale Sch Publ Hlth, Dept Biostat, Sect Hlth Informat, New Haven, CT USA
[7] Yale New Haven Hlth Syst, Digital & Technol Solut, New Haven, CT USA
基金
美国国家卫生研究院;
关键词
artificial intelligence; neurologists; peer review; stroke; writing; CHATGPT;
D O I
10.1161/STROKEAHA.124.045012
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Artificial intelligence (AI) large language models (LLMs) now produce human-like general text and images. LLMs' ability to generate persuasive scientific essays that undergo evaluation under traditional peer review has not been systematically studied. To measure perceptions of quality and the nature of authorship, we conducted a competitive essay contest in 2024 with both human and AI participants. Human authors and 4 distinct LLMs generated essays on controversial topics in stroke care and outcomes research. A panel of Stroke Editorial Board members (mostly vascular neurologists), blinded to author identity and with varying levels of AI expertise, rated the essays for quality, persuasiveness, best in topic, and author type. Among 34 submissions (22 human and 12 LLM) scored by 38 reviewers, human and AI essays received mostly similar ratings, though AI essays were rated higher for composition quality. Author type was accurately identified only 50% of the time, with prior LLM experience associated with improved accuracy. In multivariable analyses adjusted for author attributes and essay quality, only persuasiveness was independently associated with odds of a reviewer assigning AI as author type (adjusted odds ratio, 1.53 [95% CI, 1.09-2.16]; P=0.01). In conclusion, a group of experienced editorial board members struggled to distinguish human versus AI authorship, with a bias against best in topic for essays judged to be AI generated. Scientific journals may benefit from educating reviewers on the types and uses of AI in scientific writing and developing thoughtful policies on the appropriate use of AI in authoring manuscripts.
引用
收藏
页码:2573 / 2578
页数:6
相关论文
共 16 条
[1]   Scientific Authors in a Changing World of Scholarly Communication: Mat Does the Future Held? [J].
Baffy, Gyorgy ;
Burns, Michele M. ;
Hoffmann, Beatrice ;
Ramani, Subha ;
Sabharwal, Sunil ;
Borus, Jonathan F. ;
Pories, Susan ;
Quan, Stuart F. ;
Ingelfinger, Julie R. .
AMERICAN JOURNAL OF MEDICINE, 2020, 133 (01) :26-31
[2]  
Biswas S, 2023, YALE J BIOL MED, V96, P415, DOI 10.59249/SKDH9286
[3]   The future landscape of large language models in medicine [J].
Clusmann, Jan ;
Kolbinger, Fiona R. ;
Muti, Hannah Sophie ;
Carrero, Zunamys I. ;
Eckardt, Jan-Niklas ;
Laleh, Narmin Ghaffari ;
Loeffler, Chiara Maria Lavinia ;
Schwarzkopf, Sophie-Caroline ;
Unger, Michaela ;
Veldhuizen, Gregory P. ;
Wagner, Sophia J. ;
Kather, Jakob Nikolas .
COMMUNICATIONS MEDICINE, 2023, 3 (01)
[4]   The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers [J].
Eysenbach, Gunther .
JMIR MEDICAL EDUCATION, 2023, 9
[5]   Reporting Use of AI in Research and Scholarly Publication-JAMA Network Guidance [J].
Flanagin, Annette ;
Pirracchio, Romain ;
Khera, Rohan ;
Berkwits, Michael ;
Hswen, Yulin ;
Bibbins-Domingo, Kirsten .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2024, 331 (13) :1096-1098
[6]   ChatGPT one year on: who is using it, how and why? [J].
Ghassemi, Marzyeh ;
Birhane, Abeba ;
Bilal, Mushtaq ;
Kankaria, Siddharth ;
Malone, Claire ;
Mollick, Ethan ;
Tustumi, Francisco .
NATURE, 2023, 624 (7990) :39-41
[7]   The ethics of disclosing the use of artificial intelligence tools in writing scholarly manuscripts [J].
Hosseini, Mohammad ;
Resnik, David B. ;
Holmes, Kristi .
RESEARCH ETHICS, 2023, 19 (04) :449-465
[8]  
Inam M, 2024, CURR PROB CARDIOLOGY, V49, DOI 10.1016/j.cpcardiol.2024.102387
[9]   Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine [J].
Lee, Peter ;
Bubeck, Sebastien ;
Petro, Joseph .
NEW ENGLAND JOURNAL OF MEDICINE, 2023, 388 (13) :1233-1239
[10]   Best Practices for Using AI Tools as an Author, Peer Reviewer, or Editor [J].
Leung, Tiffany, I ;
Cardoso, Taiane de Azevedo ;
Mavragani, Amaryllis ;
Eysenbach, Gunther .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25