AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models

被引:23
作者
Datta, Surabhi [1 ]
Lee, Kyeryoung [1 ]
Paek, Hunki [1 ]
Manion, Frank J. [1 ]
Ofoegbu, Nneka [1 ]
Du, Jingcheng [1 ]
Li, Ying [2 ]
Huang, Liang-Chin [1 ]
Wang, Jingqi [1 ]
Lin, Bin [1 ]
Xu, Hua [3 ]
Wang, Xiaoyan [1 ,4 ]
机构
[1] Melax Technol, Houston, TX 77030 USA
[2] Regeneron Pharmaceut, Tarrytown, NY 10591 USA
[3] Yale Sch Med, New Haven, CT 06511 USA
[4] Melax Technol, 2450 Holcombe Blvd Suite 112, Houston, TX 77030 USA
关键词
GPT-4; zero-shot prompting; large language model; information extraction; clinical trial eligibility criteria; natural language processing;
D O I
10.1093/jamia/ocad218
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objectives We aim to build a generalizable information extraction system leveraging large language models to extract granular eligibility criteria information for diverse diseases from free text clinical trial protocol documents. We investigate the model's capability to extract criteria entities along with contextual attributes including values, temporality, and modifiers and present the strengths and limitations of this system.Materials and Methods The clinical trial data were acquired from https://ClinicalTrials.gov/. We developed a system, AutoCriteria, which comprises the following modules: preprocessing, knowledge ingestion, prompt modeling based on GPT, postprocessing, and interim evaluation. The final system evaluation was performed, both quantitatively and qualitatively, on 180 manually annotated trials encompassing 9 diseases.Results AutoCriteria achieves an overall F1 score of 89.42 across all 9 diseases in extracting the criteria entities, with the highest being 95.44 for nonalcoholic steatohepatitis and the lowest of 84.10 for breast cancer. Its overall accuracy is 78.95% in identifying all contextual information across all diseases. Our thematic analysis indicated accurate logic interpretation of criteria as one of the strengths and overlooking/neglecting the main criteria as one of the weaknesses of AutoCriteria.Discussion AutoCriteria demonstrates strong potential to extract granular eligibility criteria information from trial documents without requiring manual annotations. The prompts developed for AutoCriteria generalize well across different disease areas. Our evaluation suggests that the system handles complex scenarios including multiple arm conditions and logics.Conclusion AutoCriteria currently encompasses a diverse range of diseases and holds potential to extend to more in the future. This signifies a generalizable and scalable solution, poised to address the complexities of clinical trial application in real-world settings.
引用
收藏
页码:375 / 385
页数:11
相关论文
共 35 条
[1]  
Agrawal Monica., 2022, P 2022 C EMPIRICAL M, P1998, DOI [10.18653/v1/2022.emnlp-main.130, DOI 10.18653/V1/2022.EMNLP-MAIN.130]
[2]  
Arsenyan V., 2023, ARXIV, DOI DOI 10.48550/ARXIV.2301.12473
[3]   Using GPT-3 to Build a Lexicon of Drugs of Abuse Synonyms for Social Media Pharmacovigilance [J].
Carpenter, Kristy A. ;
Altman, Russ B. .
BIOMOLECULES, 2023, 13 (02)
[4]   The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria [J].
Dobbins, Nicholas J. ;
Mullen, Tony ;
Uzuner, Ozlem ;
Yetisgen, Meliha .
SCIENTIFIC DATA, 2022, 9 (01)
[5]  
Dunn A., 2022, ARXIV, DOI DOI 10.48550/ARXIV.2212.05238
[6]   The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers [J].
Eysenbach, Gunther .
JMIR MEDICAL EDUCATION, 2023, 9
[7]  
Hu Y., 2024, ARXIV, DOI DOI 10.48550/ARXIV.2303.16416
[8]   EliIE: An open-source information extraction system for clinical trial eligibility criteria [J].
Kang, Tian ;
Zhang, Shaodian ;
Tang, Youlan ;
Hruby, Gregory W. ;
Rusanov, Alexander ;
Elhadad, Noemie ;
Weng, Chunhua .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (06) :1062-1071
[9]  
Kasai J., 2023, arXiv, DOI DOI 10.48550/ARXIV.2303.18027
[10]   ChatGPT- Reshaping medical education and clinical management [J].
Khan, Rehan Ahmed ;
Jawaid, Masood ;
Khan, Aymen Rehan ;
Sajjad, Madiha .
PAKISTAN JOURNAL OF MEDICAL SCIENCES, 2023, 39 (02) :605-607