Optimizing Clinical Trial Eligibility Design Using Natural Language Processing Models and Real-World Data: Algorithm Development and Validation

被引:2
作者
Lee, Kyeryoung [1 ]
Liu, Zongzhi [1 ]
Mai, Yun [1 ]
Jun, Tomi [1 ]
Ma, Meng [1 ]
Wang, Tongyu [1 ]
Ai, Lei [1 ]
Calay, Ediz [1 ]
Oh, William [1 ,2 ]
Stolovitzky, Gustavo [1 ]
Schadt, Eric [1 ,2 ]
Wang, Xiaoyan [1 ]
机构
[1] GendDx Sema4, 333 Ludlow St, Stamford, CT 06902 USA
[2] Icahn Sch Med Mt Sinai, New York, NY USA
来源
JMIR AI | 2024年 / 3卷
关键词
natural language processing; real-world data; clinical trial eligibility criteria; eligibility criteria-specific ontology; clinical trial protocol optimization; data-driven approach; RANDOMIZED CONTROLLED-TRIALS; CRITERIA; GENERALIZABILITY; REPRESENTATION; INFORMATION; EXTRACTION; SUCCESS; FAIL;
D O I
10.2196/50800
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Clinical trials are vital for developing new therapies but can also delay drug development. Efficient trial data management, optimized trial protocol, and accurate patient identification are critical for reducing trial timelines. Natural language processing (NLP) has the potentialto achieve these objectives. Objective: This study aims to assess the feasibility of using data-driven approaches to optimize clinical trial protocol design and identify eligible patients. This involves creating a comprehensive eligibility criteria knowledge base integrated within electronic health records using deep learning-based NLP techniques. Methods: We obtained data of 3281 industry-sponsored phase 2 or 3 interventional clinical trials recruiting patients with non-small cell lung cancer, prostate cancer, breast cancer, multiple myeloma, ulcerative colitis, and Crohn disease from ClinicalTrials.gov, spanning the period between 2013 and 2020. A customized bidirectional long short-term memory- and conditional random field-based NLP pipeline was used to extract all eligibility criteria attributes and convert hypernym concepts into computable hyponyms along with their corresponding values. To illustratethe simulation of clinical trial design for optimization purposes, we selected a subset of patients with non-small cell lung cancer (n=2775), curated from the Mount Sinai Health System, as a pilot study. Results: We manually annotated the clinical trial eligibility corpus (485/3281, 14.78% trials) and constructed an eligibility criteria-specific ontology. Our customized NLP pipeline, developed based on the eligibility criteria-specific ontology that we created through manual annotation, achieved high precision (0.91, range 0.67-1.00) and recall (0.79, range 0.50-1) scores, as well as a high F 1-score (0.83, range 0.67-1), enabling the efficient extraction of granular criteria entities and relevant attributes from 3281 clinical trials. A standardized eligibility criteria knowledge base, compatible with electronic health records, was developed by transforming hypernym concepts into machine-interpretable hyponyms along with their corresponding values. In addition, an interface prototype demonstrated the practicality of leveraging real-world data for optimizing clinical trial protocolsand identifying eligible patients. Conclusions:Our customized NLP pipeline successfully generated a standardized eligibility criteria knowledge base by transforming hypernym criteria into machine-readable hyponymsalong with their corresponding values. A prototype interface integrating real-world patient information allows us to assess the impact of each eligibility criterion on the number of patients eligible for the trial. Leveraging NLP and real-world data in a data-driven approach holds promise for streamlining the overall clinical trial process, optimizing processes, and improving efficiency in patient identification.
引用
收藏
页数:22
相关论文
共 46 条
[1]   Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries [J].
Alfattni, Ghada ;
Peek, Niels ;
Nenadic, Goran .
JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 123
[2]  
[Anonymous], [32] O. A. ODM. OpenDroneMap/ODM GitHub Page 2020. Last accessed 3 May 2021. 2021. url: https://github.com/OpenDroneMap/ODM (ver pp. 8, 9, 13, 14, 19).
[3]  
BEGG CB, 1987, B CANCER, V74, P197
[4]   Online Patient Recruitment in Clinical Trials: Systematic Review and Meta-Analysis [J].
Brogger-Mikkelsen, Mette ;
Ali, Zarqa ;
Zibert, John R. ;
Andersen, Anders Daniel ;
Thomsen, Simon Francis .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (11)
[5]   Using Electronic Health Records to Derive Control Arms for Early Phase Single-Arm Lung Cancer Trials: Proof-of-Concept in Randomized Controlled Trials [J].
Carrigan, Gillis ;
Whipple, Samuel ;
Capra, William B. ;
Taylor, Michael D. ;
Brown, Jeffrey S. ;
Lu, Michael ;
Arnieri, Brandon ;
Copping, Ryan ;
Rothman, Kenneth J. .
CLINICAL PHARMACOLOGY & THERAPEUTICS, 2020, 107 (02) :369-377
[6]   A novel semantic representation for eligibility criteria in clinical trials [J].
Chondrogiannis, Efthymios ;
Andronikou, Vassiliki ;
Tagaris, Anastasios ;
Karanastasis, Efstathios ;
Varvarigou, Theodora ;
Tsuji, Masatsugu .
JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 69 :10-23
[7]   Key factors in the rising cost of new drug discovery and development [J].
Dickson, M ;
Gagnon, JP .
NATURE REVIEWS DRUG DISCOVERY, 2004, 3 (05) :417-429
[8]   The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria [J].
Dobbins, Nicholas J. ;
Mullen, Tony ;
Uzuner, Ozlem ;
Yetisgen, Meliha .
SCIENTIFIC DATA, 2022, 9 (01)
[9]   A data-driven approach to optimizing clinical study eligibility criteria [J].
Fang, Yilu ;
Liu, Hao ;
Idnay, Betina ;
Ta, Casey ;
Marder, Karen ;
Weng, Chunhua .
JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 142
[10]   Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review [J].
Fogel, David B. .
CONTEMPORARY CLINICAL TRIALS COMMUNICATIONS, 2018, 11 :156-164