Human versus artificial intelligence-generated arthroplasty literature: A single-blinded analysis of perceived communication, quality, and authorship source

被引:5
作者
Lawrence, Kyle W. [1 ,2 ]
Habibi, Akram A. [1 ]
Ward, Spencer A. [1 ]
Lajam, Claudette M. [1 ]
Schwarzkopf, Ran [1 ]
Rozell, Joshua C. [1 ]
机构
[1] NYU Langone Hlth, Dept Orthoped Surg, New York, NY USA
[2] NYU Langone Hlth, Dept Orthoped Surg, 301 East 17th St,15th Floor Suite 1518, New York, NY 10003 USA
关键词
artificial intelligence; ChatGPT; large language models; medical literature; total hip arthroplasty; total knee arthroplasty;
D O I
10.1002/rcs.2621
中图分类号
R61 [外科手术学];
学科分类号
摘要
BackgroundLarge language models (LLM) have unknown implications for medical research. This study assessed whether LLM-generated abstracts are distinguishable from human-written abstracts and to compare their perceived quality.MethodsThe LLM ChatGPT was used to generate 20 arthroplasty abstracts (AI-generated) based on full-text manuscripts, which were compared to originally published abstracts (human-written). Six blinded orthopaedic surgeons rated abstracts on overall quality, communication, and confidence in the authorship source. Authorship-confidence scores were compared to a test value representing complete inability to discern authorship.ResultsModestly increased confidence in human authorship was observed for human-written abstracts compared with AI-generated abstracts (p = 0.028), though AI-generated abstract authorship-confidence scores were statistically consistent with inability to discern authorship (p = 0.999). Overall abstract quality was higher for human-written abstracts (p = 0.019).ConclusionsAI-generated abstracts' absolute authorship-confidence ratings demonstrated difficulty in discerning authorship but did not achieve the perceived quality of human-written abstracts. Caution is warranted in implementing LLMs into scientific writing.
引用
收藏
页数:9
相关论文
共 24 条