Estimating the quality of published medical research with ChatGPT

被引:0
作者
Thelwall, Mike [1 ]
Jiang, Xiaorui [1 ]
Bath, Peter A. [2 ]
机构
[1] Univ Sheffield, Informat Sch, Sheffield, England
[2] Univ Sheffield, Informat Sch, Hlth Informat Res Grp, Sheffield, England
基金
英国经济与社会研究理事会;
关键词
Research evaluation; Medical research evaluation; ChatGPT; Large Language models; AI research evaluation; THERAPY;
D O I
10.1016/j.ipm.2025.104123
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating the quality of published research is important for evaluations of departments, researchers, and job candidates. Citation-based indicators sometimes support these tasks, but do not work for new articles and have low or moderate accuracy. Previous research has shown that ChatGPT can estimate the quality of research articles, with its scores correlating positively with an expert scores proxy in all fields, and often more strongly than citation-based indicators, except for clinical medicine. ChatGPT scores may therefore replace citation-based indicators for some applications. This article investigates the clinical medicine anomaly with the largest dataset yet and a more detailed analysis. The results showed that ChatGPT 4o-mini scores for articles submitted to the UK's Research Excellence Framework (REF) 2021 Unit of Assessment (UoA) 1 Clinical Medicine correlated positively (r = 0.134, n = 9872) with departmental mean REF scores, against a theoretical maximum correlation of r = 0.226. ChatGPT 4o and 3.5 turbo also gave positive correlations. At the departmental level, mean ChatGPT scores correlated more strongly with departmental mean REF scores (r = 0.395, n = 31). For the 100 journals with the most articles in UoA 1, their mean ChatGPT score correlated strongly with their departmental mean REF score (r = 0.495) but negatively with their citation rate (r=-0.148). Journal and departmental anomalies in these results point to ChatGPT being ineffective at assessing the quality of research in prestigious medical journals or research directly affecting human health, or both. Nevertheless, the results give evidence of ChatGPT's ability to assess research quality overall for Clinical Medicine, where it might replace citation-based indicators for new research.
引用
收藏
页数:11
相关论文
共 32 条
  • [11] Gov.uk, 2024, Life sciences competitiveness indicators 2024: summary
  • [12] The Leiden Manifesto for research metrics
    Hicks, Diana
    Wouters, Paul
    Waltman, Ludo
    de Rijcke, Sarah
    Rafols, Ismael
    [J]. NATURE, 2015, 520 (7548) : 429 - 431
  • [13] ChatGPT: Jack of all trades, master of none
    Kocon, Jan
    Cichecki, Igor
    Kaszyca, Oliwier
    Kochanek, Mateusz
    Szydlo, Dominika
    Baran, Joanna
    Bielaniewicz, Julita
    Gruza, Marcin
    Janz, Arkadiusz
    Kanclerz, Kamil
    Kocon, Anna
    Koptyra, Bartlomiej
    Mieleszczenko-Kowszewicz, Wiktoria
    Milkowski, Piotr
    Oleksy, Marcin
    Piasecki, Maciej
    Radlinski, Lukasz
    Wojtasik, Konrad
    Wozniak, Stanislaw
    Kazienko, Przemyslaw
    [J]. INFORMATION FUSION, 2023, 99
  • [14] Kousha K, 2024, Arxiv, DOI arXiv:2410.19948
  • [15] Co-existing Notions of Research Quality: A Framework to Study Context-specific Understandings of Good Research
    Langfeldt, Liv
    Nedeva, Maria
    Sorlin, Sverker
    Thomas, Duncan A.
    [J]. MINERVA, 2020, 58 (01) : 115 - 137
  • [16] Phase I dose escalation study of concurrent palliative radiation therapy with sorafenib in three anatomical cohorts (Thorax, Abdomen, Pelvis): The TAP study
    Murray, Louise
    Longo, Joseph
    Wan, Jonathan
    Chung, Caroline
    Wang, Lisa
    Dawson, Laura
    Milosevic, Michael
    Oza, Amit
    Brade, Anthony
    [J]. RADIOTHERAPY AND ONCOLOGY, 2017, 124 (01) : 74 - 79
  • [17] OpenAI, 2024, Data usage for consumer services FAQ
  • [18] Ouyang L, 2022, ADV NEUR IN
  • [19] How Nature readers are using ChatGPT
    Owens, Brian
    [J]. NATURE, 2023, 615 (7950) : 20 - 20
  • [20] Exploring the potential of ChatGPT in the peer review process: An observational study
    Saad, Ahmed
    Jenko, Nathan
    Ariyaratne, Sisith
    Birch, Nick
    Iyengar, Karthikeyan P.
    Davies, Arthur Mark
    Vaishya, Raju
    Botchub, Rajesh
    [J]. DIABETES & METABOLIC SYNDROME-CLINICAL RESEARCH & REVIEWS, 2024, 18 (02)