Methodological insights into ChatGPT's screening performance in systematic reviews

被引:7
|
作者
Issaiy, Mahbod [1 ]
Ghanaati, Hossein [1 ]
Kolahi, Shahriar [1 ]
Shakiba, Madjid [1 ]
Jalali, Amir Hossein [1 ]
Zarei, Diana [1 ]
Kazemian, Sina [2 ]
Avanaki, Mahsa Alborzi [1 ]
Firouznia, Kavous [1 ]
机构
[1] Univ Tehran Med Sci, Adv Diagnost & Intervent Radiol Res Ctr ADIR, Tehran, Iran
[2] Univ Tehran Med Sci, Cardiovasc Dis Res Inst, Cardiac Primary Prevent Res Ctr, Tehran, Iran
关键词
Systematic review; ChatGPT; AI; Large language model; Article screening; Radiology; GPT;
D O I
10.1186/s12874-024-02203-8
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background The screening process for systematic reviews and meta-analyses in medical research is a labor-intensive and time-consuming task. While machine learning and deep learning have been applied to facilitate this process, these methods often require training data and user annotation. This study aims to assess the efficacy of ChatGPT, a large language model based on the Generative Pretrained Transformers (GPT) architecture, in automating the screening process for systematic reviews in radiology without the need for training data.Methods A prospective simulation study was conducted between May 2nd and 24th, 2023, comparing ChatGPT's performance in screening abstracts against that of general physicians (GPs). A total of 1198 abstracts across three subfields of radiology were evaluated. Metrics such as sensitivity, specificity, positive and negative predictive values (PPV and NPV), workload saving, and others were employed. Statistical analyses included the Kappa coefficient for inter-rater agreement, ROC curve plotting, AUC calculation, and bootstrapping for p-values and confidence intervals.Results ChatGPT completed the screening process within an hour, while GPs took an average of 7-10 days. The AI model achieved a sensitivity of 95% and an NPV of 99%, slightly outperforming the GPs' sensitive consensus (i.e., including records if at least one person includes them). It also exhibited remarkably low false negative counts and high workload savings, ranging from 40 to 83%. However, ChatGPT had lower specificity and PPV compared to human raters. The average Kappa agreement between ChatGPT and other raters was 0.27.Conclusions ChatGPT shows promise in automating the article screening phase of systematic reviews, achieving high sensitivity and workload savings. While not entirely replacing human expertise, it could serve as an efficient first-line screening tool, particularly in reducing the burden on human resources. Further studies are needed to fine-tune its capabilities and validate its utility across different medical subfields.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Methodological insights into ChatGPT’s screening performance in systematic reviews
    Mahbod Issaiy
    Hossein Ghanaati
    Shahriar Kolahi
    Madjid Shakiba
    Amir Hossein Jalali
    Diana Zarei
    Sina Kazemian
    Mahsa Alborzi Avanaki
    Kavous Firouznia
    BMC Medical Research Methodology, 24
  • [2] Screening articles for systematic reviews with ChatGPT
    Syriani, Eugene
    David, Istvan
    Kumar, Gauransh
    JOURNAL OF COMPUTER LANGUAGES, 2024, 80
  • [3] An Empirical Study Evaluating ChatGPT's Performance in Generating Search Strategies for Systematic Reviews
    Yu, Fei
    Kincaide, Heather
    Carlson, Rebecca Beth
    Proceedings of the Association for Information Science and Technology, 2024, 61 (01) : 423 - 434
  • [4] The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation
    Gwon, Yong Nam
    Kim, Jae Heon
    Chung, Hyun Soo
    Jung, Eun Jee
    Chun, Joey
    Lee, Serin
    Shim, Sung Ryul
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [5] The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation
    Gwon, Yong Nam
    Kim, Jae Heon
    Chung, Hyun Soo
    Jung, Eun Jee
    Chun, Joey
    Lee, Serin
    Shim, Sung Ryul
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [6] Methodological quality of systematic reviews on influenza vaccination
    Remschmidt, Cornelius
    Wichmann, Ole
    Harder, Thomas
    VACCINE, 2014, 32 (15) : 1678 - 1684
  • [7] The methodological rigour of systematic reviews in environmental health
    Menon, J. M. L.
    Struijs, F.
    Whaley, P.
    CRITICAL REVIEWS IN TOXICOLOGY, 2022, 52 (03) : 167 - 187
  • [8] Methodological quality of systematic reviews on Chinese herbal medicine: a methodological survey
    Andy K. L. Cheung
    Charlene H. L. Wong
    Leonard Ho
    Irene X. Y. Wu
    Fiona Y. T. Ke
    Vincent C. H. Chung
    BMC Complementary Medicine and Therapies, 22
  • [9] Methodological quality of systematic reviews on Chinese herbal medicine: a methodological survey
    Cheung, Andy K. L.
    Wong, Charlene H. L.
    Ho, Leonard
    Wu, Irene X. Y.
    Ke, Fiona Y. T.
    Chung, Vincent C. H.
    BMC COMPLEMENTARY MEDICINE AND THERAPIES, 2022, 22 (01)
  • [10] Methodological Quality of Systematic Reviews Addressing Orthodontic Interventions: Methodological Study
    Notaro, Sarah Queiroz
    Hermont, Ana Paula
    Cruz, Poliana Valdelice
    Maia, Raiane Machado
    Avila, Walesca Melo
    Pericic, Tina Poklepovic
    Abreu, Lucas Guimaraes
    Jiao, Ruimin
    Martins-Pfeifer, Carolina Castro
    PESQUISA BRASILEIRA EM ODONTOPEDIATRIA E CLINICA INTEGRADA, 2024, 24