Predicting seizure recurrence after an initial seizure-like episode from routine clinical notes using large language models: a retrospective cohort study

被引:5
作者
Beaulieu-Jones, Brett K. [1 ,2 ,7 ]
Villamar, Mauricio F. [3 ]
Scordis, Phil
Bartmann, Ana Paula [4 ]
Ali, Waqar
Wissel, Benjamin [4 ,5 ]
Alsentzer, Emily [2 ]
de Jong, Johann [6 ]
Patra, Arijit
Kohane, Isaac
机构
[1] Univ Chicago, Dept Med, Chicago, IL USA
[2] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[3] Brown Univ, Dept Neurol, Warren Alpert Med Sch, Providence, RI USA
[4] UCB, Brussels, Belgium
[5] Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH USA
[6] UCB Biosci, Monheim, Germany
[7] Univ Chicago, Dept Med, Chicago, IL 60637 USA
来源
LANCET DIGITAL HEALTH | 2023年 / 5卷 / 12期
基金
美国国家卫生研究院;
关键词
1ST UNPROVOKED SEIZURE; ANTIEPILEPTIC DRUG-WITHDRAWAL; FEBRILE STATUS EPILEPTICUS; EPILEPSY; RISK; CHILDREN; RECOMMENDATIONS; MANAGEMENT; CHILDHOOD; OUTCOMES;
D O I
10.1016/S2589-7500(23)00179-6
中图分类号
R-058 [];
学科分类号
摘要
Background The evaluation and management of first-time seizure-like events in children can be difficult because these episodes are not always directly observed and might be epileptic seizures or other conditions (seizure mimics). We aimed to evaluate whether machine learning models using real-world data could predict seizure recurrence after an initial seizure-like event. Methods This retrospective cohort study compared models trained and evaluated on two separate datasets between Jan 1, 2010, and Jan 1, 2020: electronic medical records (EMRs) at Boston Children's Hospital and de-identified, patient-level, administrative claims data from the IBM MarketScan research database. The study population comprised patients with an initial diagnosis of either epilepsy or convulsions before the age of 21 years, based on International Classification of Diseases, Clinical Modification (ICD-CM) codes. We compared machine learning-based predictive modelling using structured data (logistic regression and XGBoost) with emerging techniques in natural language processing by use of large language models. Findings The primary cohort comprised 14 021 patients at Boston Children's Hospital matching inclusion criteria with an initial seizure-like event and the comparison cohort comprised 15 062 patients within the IBM MarketScan research database. Seizure recurrence based on a composite expert-derived definition occurred in 57% of patients at Boston Children's Hospital and 63% of patients within IBM MarketScan. Large language models with additional domain-specific and location-specific pre-training on patients excluded from the study (F1-score 0 center dot 826 [95% CI 0 center dot 817-0 center dot 835], AUC 0 center dot 897 [95% CI 0 center dot 875-0 center dot 913]) performed best. All large language models, including the base model without additional pre-training (F1-score 0 center dot 739 [95% CI 0 center dot 738-0 center dot 741], AUROC 0 center dot 846 [95% CI 0 center dot 826-0 center dot 861]) outperformed models trained with structured data. With structured data only, XGBoost outperformed logistic regression and XGBoost models trained with the Boston Children's Hospital EMR (logistic regression: F1-score 0 center dot 650 [95% CI 0 center dot 643-0 center dot 657], AUC 0 center dot 694 [95% CI 0 center dot 685-0 center dot 705], XGBoost: F1-score 0 center dot 679 [0 center dot 676-0 center dot 683], AUC 0 center dot 725 [0 center dot 717-0 center dot 734]) performed similarly to models trained on the IBM MarketScan database (logistic regression: F1-score 0 center dot 596 [0 center dot 590-0 center dot 601], AUC 0 center dot 670 [0 center dot 664-0 center dot 675], XGBoost: F1-score 0 center dot 678 [0 center dot 668-0 center dot 687], AUC 0 center dot 710 [0 center dot 703-0 center dot 714]). Interpretation Physician's clinical notes about an initial seizure-like event include substantial signals for prediction of seizure recurrence, and additional domain-specific and location-specific pre-training can significantly improve the performance of clinical large language models, even for specialised cohorts. Funding UCB, National Institute of Neurological Disorders and Stroke (US National Institutes of Health). Copyright (c) 2023 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license.
引用
收藏
页码:E882 / E894
页数:13
相关论文
共 60 条
  • [1] Incidence and Prevalence of Childhood Epilepsy: A Nationwide Cohort Study
    Aaberg, Kari Modalsli
    Gunnes, Nina
    Bakken, Inger Johanne
    Soraas, Camilla Lund
    Berntsen, Aleksander
    Magnus, Per
    Lossius, Morten I.
    Stoltenberg, Camilla
    Chin, Richard
    Suren, Pal
    [J]. PEDIATRICS, 2017, 139 (05)
  • [2] Biases in electronic health record data due to processes within the healthcare system: retrospective observational study
    Agniel, Denis
    Kohane, Isaac S.
    Weber, Griffin M.
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2018, 361
  • [3] Indications and prescribing patterns of antiseizure medications in children in New Zealand
    Ali, Shayma
    Stanley, James
    Davis, Suzanne
    Keenan, Ngaire
    Scheffer, Ingrid E.
    Sadleir, Lynette G.
    [J]. DEVELOPMENTAL MEDICINE AND CHILD NEUROLOGY, 2023, 65 (09) : 1247 - 1255
  • [4] Machine learning for patient risk stratification: standing on, or looking over, the shoulders of clinicians?
    Beaulieu-Jones, Brett K.
    Yuan, William
    Brat, Gabriel A.
    Beam, Andrew L.
    Weber, Griffin
    Ruffin, Marshall
    Kohane, Isaac S.
    [J]. NPJ DIGITAL MEDICINE, 2021, 4 (01)
  • [5] Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
    Beaulieu-Jones, Brett K.
    Lavage, Daniel R.
    Snyder, John W.
    Moore, Jason H.
    Pendergrass, Sarah A.
    Bauer, Christopher R.
    [J]. JMIR MEDICAL INFORMATICS, 2018, 6 (01)
  • [6] Risk of seizure recurrence in people with single seizures and early epilepsy-Model development and external validation
    Bonnett, Laura J.
    Kim, Lois
    Johnson, Anthony
    Sander, Josemir W.
    Lawn, Nicholas
    Beghi, Ettore
    Leone, Maurizio
    Marson, Anthony G.
    [J]. SEIZURE-EUROPEAN JOURNAL OF EPILEPSY, 2022, 94 : 26 - 32
  • [7] External Validation of a Prognostic Model for Seizure Recurrence Following a First Unprovoked Seizure and Implications for Driving
    Bonnett, Laura Jayne
    Marson, Anthony G.
    Johnson, Anthony
    Kim, Lois
    Sander, Josemir W.
    Lawn, Nicholas
    Beghi, Ettore
    Leone, Maurizio
    Smith, Catrin Tudur
    [J]. PLOS ONE, 2014, 9 (06):
  • [8] Chen T, 2016, ARXIV, DOI DOI 10.48550/ARXIV.1603.02754(PREPRINT
  • [9] Validation of the predictive model for seizure recurrence after withdrawal of antiepileptic drugs
    Chu, Shan-shan
    Tan, Ge
    Wang, Xue-ping
    Liu, Ling
    [J]. EPILEPSY & BEHAVIOR, 2021, 114
  • [10] Prediction of seizure recurrence risk following discontinuation of antiepileptic drugs
    Contento, Margherita
    Bertaccini, Bruno
    Biggi, Martina
    Magliani, Matteo
    Failli, Ylenia
    Rosati, Eleonora
    Massacesi, Luca
    Paganini, Marco
    [J]. EPILEPSIA, 2021, 62 (09) : 2159 - 2170