Estimating the quality of published medical research with ChatGPT

被引:0
作者
Thelwall, Mike [1 ]
Jiang, Xiaorui [1 ]
Bath, Peter A. [2 ]
机构
[1] Univ Sheffield, Informat Sch, Sheffield, England
[2] Univ Sheffield, Informat Sch, Hlth Informat Res Grp, Sheffield, England
基金
英国经济与社会研究理事会;
关键词
Research evaluation; Medical research evaluation; ChatGPT; Large Language models; AI research evaluation; THERAPY;
D O I
10.1016/j.ipm.2025.104123
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating the quality of published research is important for evaluations of departments, researchers, and job candidates. Citation-based indicators sometimes support these tasks, but do not work for new articles and have low or moderate accuracy. Previous research has shown that ChatGPT can estimate the quality of research articles, with its scores correlating positively with an expert scores proxy in all fields, and often more strongly than citation-based indicators, except for clinical medicine. ChatGPT scores may therefore replace citation-based indicators for some applications. This article investigates the clinical medicine anomaly with the largest dataset yet and a more detailed analysis. The results showed that ChatGPT 4o-mini scores for articles submitted to the UK's Research Excellence Framework (REF) 2021 Unit of Assessment (UoA) 1 Clinical Medicine correlated positively (r = 0.134, n = 9872) with departmental mean REF scores, against a theoretical maximum correlation of r = 0.226. ChatGPT 4o and 3.5 turbo also gave positive correlations. At the departmental level, mean ChatGPT scores correlated more strongly with departmental mean REF scores (r = 0.395, n = 31). For the 100 journals with the most articles in UoA 1, their mean ChatGPT score correlated strongly with their departmental mean REF score (r = 0.495) but negatively with their citation rate (r=-0.148). Journal and departmental anomalies in these results point to ChatGPT being ineffective at assessing the quality of research in prestigious medical journals or research directly affecting human health, or both. Nevertheless, the results give evidence of ChatGPT's ability to assess research quality overall for Clinical Medicine, where it might replace citation-based indicators for new research.
引用
收藏
页数:11
相关论文
共 32 条
  • [1] Citations, Citation Indicators, and Research Quality: An Overview of Basic Concepts and Theories
    Aksnes, Dag W.
    Langfeldt, Liv
    Wouters, Paul
    [J]. SAGE OPEN, 2019, 9 (01):
  • [2] Antiplatelet therapy with aspirin, clopidogrel, and dipyridamole versus clopidogrel alone or aspirin and dipyridamole in patients with acute cerebral ischaemia (TARDIS): a randomised, open-label, phase 3 superiority trial
    Bath, Philip M.
    Woodhouse, Lisa J.
    Appleton, Jason P.
    Beridze, Maia
    Christensen, Hanne
    Dineen, Robert A.
    Duley, Lelia
    England, Timothy J.
    Flaherty, Katie
    Havard, Diane
    Heptinstall, Stan
    James, Marilyn
    Krishnan, Kailash
    Markus, Hugh S.
    Montgomery, Alan A.
    Pocock, Stuart J.
    Randall, Marc
    Ranta, Annemarei
    Robinson, Thompson G.
    Scutt, Polly
    Venables, Graham S.
    Sprigg, Nikola
    [J]. LANCET, 2018, 391 (10123) : 850 - 859
  • [3] Bristows, 2023, The text and data mining copyright exception in the UK "for the sole purpose of research for a non-commercial purpose
  • [4] Chen S., 2024, Scientometrics, P1
  • [5] Basalin is an evolutionarily unconstrained protein revealed via a conserved role in flagellum basal plate function
    Dean, Samuel
    Moreira-Leite, Flavia
    Gull, Keith
    [J]. ELIFE, 2019, 8
  • [6] Awareness and Use of ChatGPT and Large Language Models: A Prospective Cross-sectional Global Survey in Urology
    Eppler, Michael
    Ganjavi, Conner
    Ramacciotti, Lorenzo Storino
    Piazza, Pietro
    Rodler, Severin
    Checcucci, Enrico
    Rivas, Juan Gomez
    Kowalewski, Karl F.
    Belenchon, Ines Rivero
    Puliatti, Stefano
    Taratkin, Mark
    Veccia, Alessandro
    Baekelandt, Loic
    Teoh, Jeremy Y. -C.
    Somani, Bhaskar K.
    Wroclawski, Marcelo
    Abreu, Andre
    Porpiglia, Francesco
    Gill, Inderbir S.
    Murphy, Declan G.
    Canes, David
    Cacciamani, Giovanni E.
    [J]. EUROPEAN UROLOGY, 2024, 85 (02) : 146 - 153
  • [7] Hierarchy of evidence: a framework for ranking evidence evaluating healthcare interventions
    Evans, D
    [J]. JOURNAL OF CLINICAL NURSING, 2003, 12 (01) : 77 - 84
  • [8] Weekly AUC2 carboplatin in acquired platinum resistant ovarian cancer with or without oral phenoxodiol, a sensitizer of platinum cytotoxicity: the phase Ill OVATURE multicenter randomized study
    Fotopoulou, C.
    Vergote, I.
    Mainwaring, P.
    Bidzinski, M.
    Vermorken, J. B.
    Ghamande, S. A.
    Harnett, P.
    Del Prete, S. A.
    Green, J. A.
    Spaczynski, M.
    Blagden, S.
    Gore, M.
    Ledermann, J.
    Kaye, S.
    Gabra, H.
    [J]. ANNALS OF ONCOLOGY, 2014, 25 (01) : 160 - 165
  • [9] Appointment and Promotion of Faculty in Medical and Dental Institutions: Understanding the Criterion for Assessment of Research Articles
    Ghani, Fazal
    [J]. PAKISTAN JOURNAL OF MEDICAL SCIENCES, 2020, 36 (04) : 593 - 595
  • [10] LY2495655, an antimyostatin antibody, in pancreatic cancer: a randomized, phase 2 trial
    Golan, Talia
    Geva, Ravit
    Richards, Donald
    Madhusudan, Srinivasan
    Lin, Boris Kin
    Wang, Haofei Tiffany
    Walgren, Richard A.
    Stemmer, Salomon M.
    [J]. JOURNAL OF CACHEXIA SARCOPENIA AND MUSCLE, 2018, 9 (05) : 871 - 879