Performance of two large language models for data extraction in evidence synthesis

被引:11
作者
Konet, Amanda [1 ]
Thomas, Ian [1 ]
Gartlehner, Gerald [1 ,2 ]
Kahwati, Leila [1 ]
Hilscher, Rainer [1 ]
Kugley, Shannon [1 ]
Crotty, Karen [1 ]
Viswanathan, Meera [1 ]
Chew, Robert [1 ]
机构
[1] RTI Int, Social Stat & Environm Sci, Res Triangle Pk, NC 27709 USA
[2] Danube Univ Krems, Dept Evidence Based Med & Evaluat, Krems, Austria
关键词
accuracy; artificial intelligence; data extraction; evidence synthesis; large language models;
D O I
10.1002/jrsm.1732
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate data extraction is a key component of evidence synthesis and critical to valid results. The advent of publicly available large language models (LLMs) has generated interest in these tools for evidence synthesis and created uncertainty about the choice of LLM. We compare the performance of two widely available LLMs (Claude 2 and GPT-4) for extracting pre-specified data elements from 10 published articles included in a previously completed systematic review. We use prompts and full study PDFs to compare the outputs from the browser versions of Claude 2 and GPT-4. GPT-4 required use of a third-party plugin to upload and parse PDFs. Accuracy was high for Claude 2 (96.3%). The accuracy of GPT-4 with the plug-in was lower (68.8%); however, most of the errors were due to the plug-in. Both LLMs correctly recognized when prespecified data elements were missing from the source PDF and generated correct information for data elements that were not reported explicitly in the articles. A secondary analysis demonstrated that, when provided selected text from the PDFs, Claude 2 and GPT-4 accurately extracted 98.7% and 100% of the data elements, respectively. Limitations include the narrow scope of the study PDFs used, that prompt development was completed using only Claude 2, and that we cannot guarantee the open-source articles were not used to train the LLMs. This study highlights the potential for LLMs to revolutionize data extraction but underscores the importance of accurate PDF parsing. For now, it remains essential for a human investigator to validate LLM extractions.
引用
收藏
页码:818 / 824
页数:7
相关论文
共 28 条
[1]   Harnessing the Power of ChatGPT for Automating Systematic Review Process: Methodology, Case Study, Limitations, and Future Directions [J].
Alshami, Ahmad ;
Elsayed, Moustafa ;
Ali, Eslam ;
Eltoukhy, Abdelrahman E. E. ;
Zayed, Tarek .
SYSTEMS, 2023, 11 (07)
[2]  
[Anonymous], ASKYOURPDF
[3]  
[Anonymous], 2023, Claude 2
[4]  
Anthropic, 2023, Model Card and Evaluations for Claude Models
[5]   Secukinumab is Superior to Ustekinumab in Clearing Skin in Patients with Moderate to Severe Plaque Psoriasis (16-Week CLARITY Results) [J].
Bagel, Jerry ;
Nia, John ;
Hashim, Peter W. ;
Patekar, Manmath ;
de Vera, Ana ;
Hugot, Sophie ;
Sheng, Kuan ;
Xia, Summer ;
Gilloteau, Isabelle ;
Muscianisi, Elisa ;
Blauvelt, Andrew ;
Lebwohl, Mark .
DERMATOLOGY AND THERAPY, 2018, 8 (04) :571-579
[6]  
Bang Y., 2023, PREPRINT
[7]   A head-to-head comparison of ixekizumab vs. guselkumab in patients with moderate-to-severe plaque psoriasis: 12-week efficacy, safety and speed of response from a randomized, double-blinded trial [J].
Blauvelt, A. ;
Papp, K. ;
Gottlieb, A. ;
Jarell, A. ;
Reich, K. ;
Maari, C. ;
Gordon, K. B. ;
Ferris, L. K. ;
Langley, R. G. ;
Tada, Y. ;
Lima, R. G. ;
Elmaraghy, H. ;
Gallo, G. ;
Renda, L. ;
Park, S. Y. ;
Burge, R. ;
Bagel, J. ;
Devani, Alim ;
Vender, Ronald ;
Lomaga, Mark A. ;
Delorme, Isabelle ;
Hong, Chih-Ho ;
Langley, Richard L. ;
Albrecht, Lorne ;
Guenther, Lyn ;
Maari, Catherine ;
Papp, Kim ;
Ohson, Kamal K. Singh ;
Barber, Kirk ;
Lynde, Charles ;
Gupta, Aditya ;
Rosoph, Leslie ;
Gauthier, Jean-Sebastien ;
Gooderham, Melinda ;
Wasel, Norman ;
Raman, Mani ;
Wiseman, Marni ;
Greenstein, David ;
Jarell, Abel ;
Moon, Charles ;
Clark, Lani ;
Jazayeri, Sadra Sasha ;
Bukhalo, Michael ;
Moore, Angela ;
Hamilton, Tiffani K. ;
Gewirtzman, Aron ;
Hazan, Lydie ;
Crowley, Jeffrey ;
Teller, Craig ;
Zirwas, Matthew .
BRITISH JOURNAL OF DERMATOLOGY, 2020, 182 (06) :1348-1358
[8]   Data extraction for evidence synthesis using a large language model: A proof-of-concept study [J].
Gartlehner, Gerald ;
Kahwati, Leila ;
Hilscher, Rainer ;
Thomas, Ian ;
Kugley, Shannon ;
Crotty, Karen ;
Viswanathan, Meera ;
Nussbaumer-Streit, Barbara ;
Booth, Graham ;
Erskine, Nathaniel ;
Konet, Amanda ;
Chew, Robert .
RESEARCH SYNTHESIS METHODS, 2024, 15 (04) :576-589
[9]   First-in-human randomized study of bimekizumab, a humanized monoclonal antibody and selective dual inhibitor of IL-17A and IL-17F, in mild psoriasis [J].
Glatt, Sophie ;
Helmer, Eric ;
Haier, Birgit ;
Strimenopoulou, Foteini ;
Price, Graham ;
Vajjah, Pavan ;
Harari, Olivier A. ;
Lambert, John ;
Shaw, Stevan .
BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2017, 83 (05) :991-1001
[10]  
Higgins J. P. T., 2023, Cochrane Handbook for Systematic Reviews of Interventions Version 6.4