Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing

被引:88
|
作者
Garg, Ravi [1 ]
Oh, Elissa [1 ]
Naidech, Andrew [1 ]
Kording, Konrad [2 ]
Prabhakaran, Shyam [3 ]
机构
[1] Northwestern Univ, Feinberg Sch Med, Dept Neurol, 633 St Clair St 2041, Chicago, IL 60611 USA
[2] Univ Penn, Philadelphia, PA 19104 USA
[3] Univ Chicago, Pritzker Sch Med, Dept Neurol, Chicago, IL 60611 USA
来源
JOURNAL OF STROKE & CEREBROVASCULAR DISEASES | 2019年 / 28卷 / 07期
关键词
Ischemic stroke; cryptogenic; cardioembolism; natural language processing; machine learning; ETIOLOGIC CLASSIFICATION; CAUSATIVE CLASSIFICATION; TOAST; MECHANISM; CCS;
D O I
10.1016/j.jstrokecerebrovasdis.2019.02.004
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Objective: The manual adjudication of disease classification is time-consuming, error-prone, and limits scaling to large datasets. In ischemic stroke (IS), subtype classification is critical for management and outcome prediction. This study sought to use natural language processing of electronic health records (EHR) combined with machine learning methods to automate IS subtyping. Methods: Among IS patients from an observational registry with TOAST subtyping adjudicated by board-certified vascular neurologists, we analyzed unstructured text-based EHR data including neurology progress notes and neuroradiology reports using natural language processing. We performed several feature selection methods to reduce the high dimensionality of the features and 5-fold cross validation to test generalizability of our methods and minimize overfitting. We used several machine learning methods and calculated the kappa values for agreement between each machine learning approach to manual adjudication. We then performed a blinded testing of the best algorithm against a held-out subset of 50 cases. Results: Compared to manual classification, the best machine-based classification achieved a kappa of .25 using radiology reports alone, .57 using progress notes alone, and .57 using combined data. Kappa values varied by subtype being highest for cardioembolic (.64) and lowest for cryptogenic cases (.47). In the held-out test subset, machine-based classification agreed with rater classification in 40 of 50 cases (kappa .72). Conclusions: Automated machine learning approaches using textual data from the EHR shows agreement with manual TOAST classification. The automated pipeline, if externally validated, could enable large-scale stroke epidemiology research.
引用
收藏
页码:2045 / 2051
页数:7
相关论文
共 50 条
  • [1] Machine Learning and Natural Language Processing for Automating Software Testing (Tutorial)
    Pezze, Mauro
    PROCEEDINGS OF THE 30TH ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2022, 2022, : 1821 - 1821
  • [2] Automating the Assessment of Multicultural Orientation Through Machine Learning and Natural Language Processing
    Goldberg, Simon B.
    Tanana, Michael
    Stewart, Shaakira Haywood
    Williams, Camille Y.
    Soma, Christina S.
    Atkins, David C.
    Imel, Zac E.
    Owen, Jesse
    PSYCHOTHERAPY, 2024,
  • [3] Automated Genre Classification of Books Using Machine Learning and Natural Language Processing
    Gupta, Shikha
    Agarwal, Mohit
    Jain, Satbir
    2019 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019), 2019, : 269 - 272
  • [4] Resume Classification System using Natural Language Processing and Machine Learning Techniques
    Ali, Irfan
    Mughal, Nimra
    Khand, Zahid Hussain
    Ahmed, Javed
    Mujtaba, Ghulam
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2022, 41 (01) : 65 - 79
  • [5] RESEARCH ON THE TEXT CLASSIFICATION BASED ON NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
    Chen Keming
    Zheng Jianguo
    JOURNAL OF THE BALKAN TRIBOLOGICAL ASSOCIATION, 2016, 22 (03): : 2484 - 2494
  • [6] Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI
    Heo, Tak Sung
    Kim, Yu Seop
    Choi, Jeong Myeong
    Jeong, Yeong Seok
    Seo, Soo Young
    Lee, Jun Ho
    Jeon, Jin Pyeong
    Kim, Chulho
    JOURNAL OF PERSONALIZED MEDICINE, 2020, 10 (04): : 1 - 11
  • [7] Prediction of 30-Day Readmission After Stroke Using Machine Learning and Natural Language Processing
    Lineback, Christina M.
    Garg, Ravi
    Oh, Elissa
    Naidech, Andrew M.
    Holl, Jane L.
    Prabhakaran, Shyam
    FRONTIERS IN NEUROLOGY, 2021, 12
  • [8] Requests classification in the customer service area for software companies using machine learning and natural language processing
    Arias-Barahona, Maria Ximena
    Arteaga-Arteaga, Harold Brayan
    Orozco-Arias, Simon
    Florez-Ruiz, Juan Camilo
    Valencia-Diaz, Mario Andres
    Tabares-Soto, Reinel
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [9] Causative Classification of Ischemic Stroke by the Machine Learning Algorithm Random Forests
    Wang, Jianan
    Gong, Xiaoxian
    Chen, Hongfang
    Zhong, Wansi
    Chen, Yi
    Zhou, Ying
    Zhang, Wenhua
    He, Yaode
    Lou, Min
    FRONTIERS IN AGING NEUROSCIENCE, 2022, 14
  • [10] Automating sedation state assessments using natural language processing
    Conway, Aaron
    Li, Jack
    Rad, Mohammad Goudarzi
    Mafeld, Sebastian
    Taati, Babak
    JOURNAL OF NURSING SCHOLARSHIP, 2025, 57 (01) : 17 - 27