AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models

被引：23

作者：

Datta, Surabhi ^{[1
]}

Lee, Kyeryoung ^{[1
]}

Paek, Hunki ^{[1
]}

Manion, Frank J. ^{[1
]}

Ofoegbu, Nneka ^{[1
]}

Du, Jingcheng ^{[1
]}

Li, Ying ^{[2
]}

Huang, Liang-Chin ^{[1
]}

Wang, Jingqi ^{[1
]}

Lin, Bin ^{[1
]}

Xu, Hua ^{[3
]}

Wang, Xiaoyan ^{[1
,4
]}

机构：

[1] Melax Technol, Houston, TX 77030 USA

[2] Regeneron Pharmaceut, Tarrytown, NY 10591 USA

[3] Yale Sch Med, New Haven, CT 06511 USA

[4] Melax Technol, 2450 Holcombe Blvd Suite 112, Houston, TX 77030 USA

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2024年 / 31卷 / 02期

关键词：

GPT-4; zero-shot prompting; large language model; information extraction; clinical trial eligibility criteria; natural language processing;

D O I：

10.1093/jamia/ocad218

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objectives We aim to build a generalizable information extraction system leveraging large language models to extract granular eligibility criteria information for diverse diseases from free text clinical trial protocol documents. We investigate the model's capability to extract criteria entities along with contextual attributes including values, temporality, and modifiers and present the strengths and limitations of this system.Materials and Methods The clinical trial data were acquired from https://ClinicalTrials.gov/. We developed a system, AutoCriteria, which comprises the following modules: preprocessing, knowledge ingestion, prompt modeling based on GPT, postprocessing, and interim evaluation. The final system evaluation was performed, both quantitatively and qualitatively, on 180 manually annotated trials encompassing 9 diseases.Results AutoCriteria achieves an overall F1 score of 89.42 across all 9 diseases in extracting the criteria entities, with the highest being 95.44 for nonalcoholic steatohepatitis and the lowest of 84.10 for breast cancer. Its overall accuracy is 78.95% in identifying all contextual information across all diseases. Our thematic analysis indicated accurate logic interpretation of criteria as one of the strengths and overlooking/neglecting the main criteria as one of the weaknesses of AutoCriteria.Discussion AutoCriteria demonstrates strong potential to extract granular eligibility criteria information from trial documents without requiring manual annotations. The prompts developed for AutoCriteria generalize well across different disease areas. Our evaluation suggests that the system handles complex scenarios including multiple arm conditions and logics.Conclusion AutoCriteria currently encompasses a diverse range of diseases and holds potential to extend to more in the future. This signifies a generalizable and scalable solution, poised to address the complexities of clinical trial application in real-world settings.

引用

页码：375 / 385

页数：11

共 35 条

[1]

Agrawal Monica., 2022, P 2022 C EMPIRICAL M, P1998, DOI [10.18653/v1/2022.emnlp-main.130, DOI 10.18653/V1/2022.EMNLP-MAIN.130]

[2]

Arsenyan V., 2023, ARXIV, DOI DOI 10.48550/ARXIV.2301.12473

[3] Using GPT-3 to Build a Lexicon of Drugs of Abuse Synonyms for Social Media Pharmacovigilance [J].