Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain

被引:20
作者
Dennstadt, Fabio [1 ,3 ,4 ]
Zink, Johannes [2 ]
Putora, Paul Martin [1 ,3 ,4 ]
Hastings, Janna [5 ,6 ,7 ]
Cihoric, Nikola [3 ,4 ]
机构
[1] Cantonal Hosp St Gallen, Dept Radiat Oncol, St Gallen, Switzerland
[2] Univ Wurzburg, Inst Comp Sci, Wurzburg, Germany
[3] Bern Univ Hosp, Dept Radiat Oncol, Inselspital, Bern, Switzerland
[4] Univ Bern, Bern, Switzerland
[5] Univ Zurich, Inst Implementat Sci Hlth Care, Zurich, Switzerland
[6] Univ St Gallen, Sch Med, St Gallen, Switzerland
[7] Swiss Inst Bioinformat, Lausanne, Switzerland
关键词
Natural language processing; Systematic literature review; Biomedicine; Title and abstract screening; Large language models; AUTOMATION;
D O I
10.1186/s13643-024-02575-4
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background Systematically screening published literature to determine the relevant publications to synthesize in a review is a time-consuming and difficult task. Large language models (LLMs) are an emerging technology with promising capabilities for the automation of language-related tasks that may be useful for such a purpose.Methods LLMs were used as part of an automated system to evaluate the relevance of publications to a certain topic based on defined criteria and based on the title and abstract of each publication. A Python script was created to generate structured prompts consisting of text strings for instruction, title, abstract, and relevant criteria to be provided to an LLM. The relevance of a publication was evaluated by the LLM on a Likert scale (low relevance to high relevance). By specifying a threshold, different classifiers for inclusion/exclusion of publications could then be defined. The approach was used with four different openly available LLMs on ten published data sets of biomedical literature reviews and on a newly human-created data set for a hypothetical new systematic literature review.Results The performance of the classifiers varied depending on the LLM being used and on the data set analyzed. Regarding sensitivity/specificity, the classifiers yielded 94.48%/31.78% for the FlanT5 model, 97.58%/19.12% for the OpenHermes-NeuralChat model, 81.93%/75.19% for the Mixtral model and 97.58%/38.34% for the Platypus 2 model on the ten published data sets. The same classifiers yielded 100% sensitivity at a specificity of 12.58%, 4.54%, 62.47%, and 24.74% on the newly created data set. Changing the standard settings of the approach (minor adaption of instruction prompt and/or changing the range of the Likert scale from 1-5 to 1-10) had a considerable impact on the performance.Conclusions LLMs can be used to evaluate the relevance of scientific publications to a certain review topic and classifiers based on such an approach show some promising results. To date, little is known about how well such systems would perform if used prospectively when conducting systematic literature reviews and what further implications this might have. However, it is likely that in the future researchers will increasingly use LLMs for evaluating and classifying scientific publications.
引用
收藏
页数:14
相关论文
共 74 条
[1]  
abstractsonline, About Us
[2]  
Akinseloyin O, 2024, medRxiv, DOI [10.1101/2023.12.17.23300102, 10.1101/2023.12.17.23300102, DOI 10.1101/2023.12.17.23300102]
[3]  
Almeida Carlos Podalirio Borges de, 2017, Rev. CEFAC, V19, P551
[4]  
[Anonymous], 2023, EPL, V143, P20000
[5]  
[Anonymous], 2024, Covidence Internet
[6]  
[Anonymous], 2020, About us
[7]   Comparative effectiveness of common therapies for Wilson disease: A systematic review and meta-analysis of controlled studies [J].
Appenzeller-Herzog, Christian ;
Mathes, Tim ;
Heeres, Marlies L. S. ;
Weiss, Karl Heinz ;
Houwen, Roderick H. J. ;
Ewald, Hannah .
LIVER INTERNATIONAL, 2019, 39 (11) :2136-2152
[8]  
Aydin O, 2022, SSRN Journal Internet
[9]   Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum [J].
Ayers, John W. ;
Poliak, Adam ;
Dredze, Mark ;
Leas, Eric C. ;
Zhu, Zechariah ;
Kelley, Jessica B. ;
Faix, Dennis J. ;
Goodman, Aaron M. ;
Longhurst, Christopher A. ;
Hogarth, Michael ;
Smith, Davey M. .
JAMA INTERNAL MEDICINE, 2023, 183 (06) :589-596
[10]  
Brown TB, 2020, Arxiv, DOI arXiv:2005.14165