Predicting seizure recurrence after an initial seizure-like episode from routine clinical notes using large language models: a retrospective cohort study

被引：5

作者：

Beaulieu-Jones, Brett K. ^{[1
,2
,7
]}

Villamar, Mauricio F. ^{[3
]}

Scordis, Phil

Bartmann, Ana Paula ^{[4
]}

Ali, Waqar

Wissel, Benjamin ^{[4
,5
]}

Alsentzer, Emily ^{[2
]}

de Jong, Johann ^{[6
]}

Patra, Arijit

Kohane, Isaac

机构：

[1] Univ Chicago, Dept Med, Chicago, IL USA

[2] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA

[3] Brown Univ, Dept Neurol, Warren Alpert Med Sch, Providence, RI USA

[4] UCB, Brussels, Belgium

[5] Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH USA

[6] UCB Biosci, Monheim, Germany

[7] Univ Chicago, Dept Med, Chicago, IL 60637 USA

来源：

LANCET DIGITAL HEALTH | 2023年 / 5卷 / 12期

基金：

美国国家卫生研究院;

关键词：

1ST UNPROVOKED SEIZURE; ANTIEPILEPTIC DRUG-WITHDRAWAL; FEBRILE STATUS EPILEPTICUS; EPILEPSY; RISK; CHILDREN; RECOMMENDATIONS; MANAGEMENT; CHILDHOOD; OUTCOMES;

D O I：

10.1016/S2589-7500(23)00179-6

中图分类号：

R-058 [];

学科分类号：

摘要：

Background The evaluation and management of first-time seizure-like events in children can be difficult because these episodes are not always directly observed and might be epileptic seizures or other conditions (seizure mimics). We aimed to evaluate whether machine learning models using real-world data could predict seizure recurrence after an initial seizure-like event. Methods This retrospective cohort study compared models trained and evaluated on two separate datasets between Jan 1, 2010, and Jan 1, 2020: electronic medical records (EMRs) at Boston Children's Hospital and de-identified, patient-level, administrative claims data from the IBM MarketScan research database. The study population comprised patients with an initial diagnosis of either epilepsy or convulsions before the age of 21 years, based on International Classification of Diseases, Clinical Modification (ICD-CM) codes. We compared machine learning-based predictive modelling using structured data (logistic regression and XGBoost) with emerging techniques in natural language processing by use of large language models. Findings The primary cohort comprised 14 021 patients at Boston Children's Hospital matching inclusion criteria with an initial seizure-like event and the comparison cohort comprised 15 062 patients within the IBM MarketScan research database. Seizure recurrence based on a composite expert-derived definition occurred in 57% of patients at Boston Children's Hospital and 63% of patients within IBM MarketScan. Large language models with additional domain-specific and location-specific pre-training on patients excluded from the study (F1-score 0 center dot 826 [95% CI 0 center dot 817-0 center dot 835], AUC 0 center dot 897 [95% CI 0 center dot 875-0 center dot 913]) performed best. All large language models, including the base model without additional pre-training (F1-score 0 center dot 739 [95% CI 0 center dot 738-0 center dot 741], AUROC 0 center dot 846 [95% CI 0 center dot 826-0 center dot 861]) outperformed models trained with structured data. With structured data only, XGBoost outperformed logistic regression and XGBoost models trained with the Boston Children's Hospital EMR (logistic regression: F1-score 0 center dot 650 [95% CI 0 center dot 643-0 center dot 657], AUC 0 center dot 694 [95% CI 0 center dot 685-0 center dot 705], XGBoost: F1-score 0 center dot 679 [0 center dot 676-0 center dot 683], AUC 0 center dot 725 [0 center dot 717-0 center dot 734]) performed similarly to models trained on the IBM MarketScan database (logistic regression: F1-score 0 center dot 596 [0 center dot 590-0 center dot 601], AUC 0 center dot 670 [0 center dot 664-0 center dot 675], XGBoost: F1-score 0 center dot 678 [0 center dot 668-0 center dot 687], AUC 0 center dot 710 [0 center dot 703-0 center dot 714]). Interpretation Physician's clinical notes about an initial seizure-like event include substantial signals for prediction of seizure recurrence, and additional domain-specific and location-specific pre-training can significantly improve the performance of clinical large language models, even for specialised cohorts. Funding UCB, National Institute of Neurological Disorders and Stroke (US National Institutes of Health). Copyright (c) 2023 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license.

引用

页码：E882 / E894

页数：13

共 60 条

[1] Incidence and Prevalence of Childhood Epilepsy: A Nationwide Cohort Study
Aaberg, Kari Modalsli
Gunnes, Nina
Bakken, Inger Johanne
Soraas, Camilla Lund
Berntsen, Aleksander
Magnus, Per
Lossius, Morten I.
Stoltenberg, Camilla
Chin, Richard
Suren, Pal
[J]. PEDIATRICS, 2017, 139 (05)
[2] Biases in electronic health record data due to processes within the healthcare system: retrospective observational study
Agniel, Denis
Kohane, Isaac S.
Weber, Griffin M.
[J]. BMJ-BRITISH MEDICAL JOURNAL, 2018, 361
[3] Indications and prescribing patterns of antiseizure medications in children in New Zealand
Ali, Shayma
Stanley, James
Davis, Suzanne
Keenan, Ngaire
Scheffer, Ingrid E.
Sadleir, Lynette G.
[J]. DEVELOPMENTAL MEDICINE AND CHILD NEUROLOGY, 2023, 65 (09) : 1247 - 1255
[4] Machine learning for patient risk stratification: standing on, or looking over, the shoulders of clinicians?
Beaulieu-Jones, Brett K.
Yuan, William
Brat, Gabriel A.
Beam, Andrew L.
Weber, Griffin
Ruffin, Marshall
Kohane, Isaac S.
[J]. NPJ DIGITAL MEDICINE, 2021, 4 (01)
[5] Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
Beaulieu-Jones, Brett K.
Lavage, Daniel R.
Snyder, John W.
Moore, Jason H.
Pendergrass, Sarah A.
Bauer, Christopher R.
[J]. JMIR MEDICAL INFORMATICS, 2018, 6 (01)
[6] Risk of seizure recurrence in people with single seizures and early epilepsy-Model development and external validation
Bonnett, Laura J.
Kim, Lois
Johnson, Anthony
Sander, Josemir W.
Lawn, Nicholas
Beghi, Ettore
Leone, Maurizio
Marson, Anthony G.
[J]. SEIZURE-EUROPEAN JOURNAL OF EPILEPSY, 2022, 94 : 26 - 32
[7] External Validation of a Prognostic Model for Seizure Recurrence Following a First Unprovoked Seizure and Implications for Driving
Bonnett, Laura Jayne
Marson, Anthony G.
Johnson, Anthony
Kim, Lois
Sander, Josemir W.
Lawn, Nicholas
Beghi, Ettore
Leone, Maurizio
Smith, Catrin Tudur
[J]. PLOS ONE, 2014, 9 (06):
[8] Chen T, 2016, ARXIV, DOI DOI 10.48550/ARXIV.1603.02754(PREPRINT
[9] Validation of the predictive model for seizure recurrence after withdrawal of antiepileptic drugs
Chu, Shan-shan
Tan, Ge
Wang, Xue-ping
Liu, Ling
[J]. EPILEPSY & BEHAVIOR, 2021, 114
[10] Prediction of seizure recurrence risk following discontinuation of antiepileptic drugs
Contento, Margherita
Bertaccini, Bruno
Biggi, Martina
Magliani, Matteo
Failli, Ylenia
Rosati, Eleonora
Massacesi, Luca
Paganini, Marco
[J]. EPILEPSIA, 2021, 62 (09) : 2159 - 2170

← 1 2 3 4 5 6 →