Assessing the Risk of Bias in Randomized Clinical Trials With Large Language Models

被引:13
|
作者
Lai, Honghao [1 ,2 ]
Ge, Long [1 ,2 ,3 ]
Sun, Mingyao [4 ]
Pan, Bei [5 ]
Huang, Jiajie [6 ]
Hou, Liangying [5 ,7 ]
Yang, Qiuyu [1 ,2 ]
Liu, Jiayi [1 ,2 ]
Liu, Jianing [6 ]
Ye, Ziying [1 ,2 ]
Xia, Danni [1 ,2 ]
Zhao, Weilong [1 ,2 ]
Wang, Xiaoman [5 ]
Liu, Ming [5 ,7 ]
Talukdar, Jhalok Ronjan [7 ]
Tian, Jinhui [3 ,5 ]
Yang, Kehu [3 ,5 ]
Estill, Janne [5 ,8 ]
机构
[1] Lanzhou Univ, Sch Publ Hlth, Dept Hlth Policy & Management, Lanzhou, Peoples R China
[2] Lanzhou Univ, Evidence Based Social Sci Res Ctr, Sch Publ Hlth, 199 Donggang West Rd, Lanzhou 730000, Peoples R China
[3] Key Lab Evidence Based Med & Knowledge Translat Ga, Lanzhou, Peoples R China
[4] Lanzhou Univ, Evidence Based Nursing Ctr, Sch Nursing, Lanzhou, Peoples R China
[5] Lanzhou Univ, Sch Basic Med Sci, Evidence Based Med Ctr, Lanzhou, Peoples R China
[6] Gansu Univ Chinese Med, Coll Nursing, Lanzhou, Peoples R China
[7] McMaster Univ, Dept Hlth Res Methods Evidence & Impact, Hamilton, ON, Canada
[8] Univ Geneva, Inst Global Hlth, Geneva, Switzerland
关键词
DOUBLE-BLIND; PRIMARY INSOMNIA; INTERRATER RELIABILITY; REBOUND INSOMNIA; WEIGHT-LOSS; LONG-TERM; RED MEAT; EFFICACY; SAFETY; DIET;
D O I
10.1001/jamanetworkopen.2024.12687
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Importance Large language models (LLMs) may facilitate the labor-intensive process of systematic reviews. However, the exact methods and reliability remain uncertain. Objective To explore the feasibility and reliability of using LLMs to assess risk of bias (ROB) in randomized clinical trials (RCTs). Design, Setting, and Participants A survey study was conducted between August 10, 2023, and October 30, 2023. Thirty RCTs were selected from published systematic reviews. Main Outcomes and Measures A structured prompt was developed to guide ChatGPT (LLM 1) and Claude (LLM 2) in assessing the ROB in these RCTs using a modified version of the Cochrane ROB tool developed by the CLARITY group at McMaster University. Each RCT was assessed twice by both models, and the results were documented. The results were compared with an assessment by 3 experts, which was considered a criterion standard. Correct assessment rates, sensitivity, specificity, and F1 scores were calculated to reflect accuracy, both overall and for each domain of the Cochrane ROB tool; consistent assessment rates and Cohen kappa were calculated to gauge consistency; and assessment time was calculated to measure efficiency. Performance between the 2 models was compared using risk differences. Results Both models demonstrated high correct assessment rates. LLM 1 reached a mean correct assessment rate of 84.5% (95% CI, 81.5%-87.3%), and LLM 2 reached a significantly higher rate of 89.5% (95% CI, 87.0%-91.8%). The risk difference between the 2 models was 0.05 (95% CI, 0.01-0.09). In most domains, domain-specific correct rates were around 80% to 90%; however, sensitivity below 0.80 was observed in domains 1 (random sequence generation), 2 (allocation concealment), and 6 (other concerns). Domains 4 (missing outcome data), 5 (selective outcome reporting), and 6 had F1 scores below 0.50. The consistent rates between the 2 assessments were 84.0% for LLM 1 and 87.3% for LLM 2. LLM 1's kappa exceeded 0.80 in 7 and LLM 2's in 8 domains. The mean (SD) time needed for assessment was 77 (16) seconds for LLM 1 and 53 (12) seconds for LLM 2. Conclusions In this survey study of applying LLMs for ROB assessment, LLM 1 and LLM 2 demonstrated substantial accuracy and consistency in evaluating RCTs, suggesting their potential as supportive tools in systematic review processes.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Pharmacologic Randomized Clinical Trials in Prevention of Type 2 Diabetes
    Knowler, William C.
    Crandall, Jill P.
    CURRENT DIABETES REPORTS, 2019, 19 (12)
  • [42] Efficacy and Safety of Prucalopride in Chronic Constipation: An Integrated Analysis of Six Randomized, Controlled Clinical Trials
    Camilleri, Michael
    Piessevaux, Hubert
    Yiannakou, Yan
    Tack, Jan
    Kerstens, Rene
    Quigley, Eamonn M. M.
    Ke, MeiYun
    Da Silva, Susana
    Levine, Amy
    DIGESTIVE DISEASES AND SCIENCES, 2016, 61 (08) : 2357 - 2372
  • [43] Evaluation of Drinking Risk Levels as Outcomes in Alcohol Pharmacotherapy Trials A Secondary Analysis of 3 Randomized Clinical Trials
    Falk, Daniel E.
    O'Malley, Stephanie S.
    Witkiewitz, Katie
    Anton, Raymond F.
    Litten, Raye Z.
    Slater, Megan
    Kranzler, Henry R.
    Mann, Karl F.
    Hasin, Deborah S.
    Johnson, Bankole
    Meulien, Didier
    Ryan, Megan
    Fertig, Joanne
    Isenberg, Keith
    McCann, David
    Meyer, Roger E.
    O'Brien, Charles
    Silverman, Bernard
    Trinquet, Francoise
    Zakine, Benjamin
    Aubin, Henri-Jean
    Ramey, Tanya
    JAMA PSYCHIATRY, 2019, 76 (04) : 374 - 381
  • [44] Improved cardiometabolic risk factors in Japanese patients with type 2 diabetes treated with ipragliflozin: a pooled analysis of six randomized, placebo-controlled trials
    Kashiwagi, Atsunori
    Sakatani, Taishi
    Nakamura, Ichiro
    Akiyama, Noriko
    Kazuta, Kenichi
    Ueyama, Eiji
    Takahashi, Hideyuki
    Kosakai, Yoshinori
    ENDOCRINE JOURNAL, 2018, 65 (07) : 693 - 705
  • [45] Commentary:: Empirical evidence of attrition bias in clinical trials
    Jüni, P
    Egger, M
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2005, 34 (01) : 87 - 88
  • [46] Handling clinical comorbidity in randomized clinical trials in psychiatry
    O'Hara, Ruth
    Beaudreau, Sherry A.
    Gould, Christine E.
    Froehlich, Wendy
    Kraemer, Helena C.
    JOURNAL OF PSYCHIATRIC RESEARCH, 2017, 86 : 26 - 33
  • [47] Risk of infection in patients with atopic dermatitis treated with dupilumab: A meta-analysis of randomized controlled trials
    Fleming, Patrick
    Drucker, Aaron M.
    JOURNAL OF THE AMERICAN ACADEMY OF DERMATOLOGY, 2018, 78 (01) : 62 - +
  • [48] Calcium Intake and Risk of Cardiovascular Disease A Review of Prospective Studies and Randomized Clinical Trials
    Wang, Lu
    Manson, JoAnn E.
    Sesso, Howard D.
    AMERICAN JOURNAL OF CARDIOVASCULAR DRUGS, 2012, 12 (02) : 105 - 116
  • [49] Assessing the magnitude of reporting bias in trials of homeopathy: a cross-sectional study and meta-analysis
    Gartlehner, Gerald
    Emprechtinger, Robert
    Hackl, Marlene
    Jutz, Franziska L.
    Gartlehner, Jacob E.
    Nonninger, Julian N.
    Klerings, Irma
    Dobrescu, Andreea Iulia
    BMJ EVIDENCE-BASED MEDICINE, 2022, 27 (06) : 345 - 351
  • [50] Clinical outcomes of PCSK9Is: a meta-analysis of randomized clinical trials
    Ghadban, Rugheed
    Enezate, Tariq
    Omran, Jad
    Almourani, Rajaa
    Singla, Atul
    Balla, Sudarshan
    CARDIOVASCULAR DIAGNOSIS AND THERAPY, 2017, 7 (06) : 598 - 606