Performance of Artificial Intelligence Content Detectors Using Human and Artificial Intelligence-Generated Scientific Writing

被引:8
作者
Flitcroft, Madelyn A. [1 ]
Sheriff, Salma A. [1 ]
Wolfrath, Nathan [1 ]
Maddula, Ragasnehith [1 ]
McConnell, Laura [1 ]
Xing, Yun [1 ]
Haines, Krista L. [2 ]
Wong, Sandra L. [3 ]
Kothari, Anai N. [1 ,4 ]
机构
[1] Med Coll Wisconsin, Dept Surg, Div Surg Oncol, Milwaukee, WI 53226 USA
[2] Duke Univ, Dept Surg, Div Trauma Crit Care & Acute Care Surg, Durham, NC USA
[3] Dartmouth Hitchcock Med Ctr, Dept Surg, Lebanon, NH USA
[4] Med Coll Wisconsin, Clin & Translat Sci Inst SE Wisconsin, Milwaukee, WI 53226 USA
关键词
Artificial intelligence; Generative AI; ChatGPT; AI detection; CHATGPT;
D O I
10.1245/s10434-024-15549-6
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
BackgroundFew studies have examined the performance of artificial intelligence (AI) content detection in scientific writing. This study evaluates the performance of publicly available AI content detectors when applied to both human-written and AI-generated scientific articles.MethodsArticles published in Annals of Surgical Oncology (ASO) during the year 2022, as well as AI-generated articles using OpenAI's ChatGPT, were analyzed by three AI content detectors to assess the probability of AI-generated content. Full manuscripts and their individual sections were evaluated. Group comparisons and trend analyses were conducted by using ANOVA and linear regression. Classification performance was determined using area under the curve (AUC).ResultsA total of 449 original articles met inclusion criteria and were evaluated to determine the likelihood of being generated by AI. Each detector also evaluated 47 AI-generated articles by using titles from ASO articles. Human-written articles had an average probability of being AI-generated of 9.4% with significant differences between the detectors. Only two (0.4%) human-written manuscripts were detected as having a 0% probability of being AI-generated by all three detectors. Completely AI-generated articles were evaluated to have a higher average probability of being AI-generated (43.5%) with a range from 12.0 to 99.9%.ConclusionsThis study demonstrates differences in the performance of various AI content detectors with the potential to label human-written articles as AI-generated. Any effort toward implementing AI detectors must include a strategy for continuous evaluation and validation as AI models and detectors rapidly evolve.
引用
收藏
页码:6387 / 6393
页数:7
相关论文
共 28 条
  • [1] AI Detector Tool Checks ChatGPT GPT-4 Bard Claude & More, CONTENT SCALE
  • [2] Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References
    Athaluri, Sai Anirudh
    Manthena, Sandeep Varma
    Kesapragada, V. S. R. Krishna Manoj
    Yarlagadda, Vineel
    Dave, Tirth
    Duddumpudi, Rama Tulasi Siri
    [J]. CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (04)
  • [3] Generative artificial intelligence: Can ChatGPT write a quality abstract?
    Babl, Franz E.
    Babl, Maximilian P.
    [J]. EMERGENCY MEDICINE AUSTRALASIA, 2023, 35 (05) : 809 - 811
  • [4] Basta C, ARXIV
  • [5] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
    Bender, Emily M.
    Gebru, Timnit
    McMillan-Major, Angelina
    Shmitchell, Shmargaret
    [J]. PROCEEDINGS OF THE 2021 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2021, 2021, : 610 - 623
  • [6] Brainard J, 2023, SCIENCE, V379, P740, DOI 10.1126/science.adh2762
  • [7] Cao Y., 2023, ARXIV
  • [8] Chemaya N, 2023, ARXIV
  • [9] From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing
    Dergaa, Ismail
    Chamari, Karim
    Zmijewski, Piotr
    Saad, Helmi Ben
    [J]. BIOLOGY OF SPORT, 2023, 40 (02) : 615 - 622
  • [10] Editorial: Generative artificial intelligence as a plagiarism problem
    Dien, Joseph
    [J]. BIOLOGICAL PSYCHOLOGY, 2023, 181