Classification of Severe Maternal Morbidity from Electronic Health Records Written in Spanish Using Natural Language Processing

被引:4
|
作者
Torres-Silva, Ever A. [1 ]
Rua, Santiago [2 ]
Giraldo-Forero, Andres F. [1 ]
Durango, Maria C. [3 ]
Florez-Arango, Jose F. [4 ]
Orozco-Duque, Andres [3 ]
机构
[1] Inst Tecnol Metropolitano, Fac Engn, Medellin 050034, Colombia
[2] Univ Nacl Abierta & Distancia, Sch Basic Sci Technol & Engn, Bogota 111321, Colombia
[3] Inst Tecnol Metropolitano, Dept Appl Sci, Medellin 050034, Colombia
[4] Weill Cornell Med, Populat Hlth Sci, New York, NY 10065 USA
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 19期
关键词
electronic health records; machine learning; maternal health; pregnancy complications; natural language processing; word-embedding; MACHINE; EMBEDDINGS;
D O I
10.3390/app131910725
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One stepping stone for reducing the maternal mortality is to identify severe maternal morbidity (SMM) using Electronic Health Records (EHRs). We aim to develop a pipeline to represent and classify the unstructured text of maternal progress notes in eight classes according to the silver labels defined by the ICD-10 codes associated with SMM. We preprocessed the text, removing protected health information (PHI) and reducing stop words. We built different pipelines to classify the SMM by the combination of six word-embeddings schemes, three different approaches for the representation of the documents (average, clustering, and principal component analysis), and five well-known machine learning classifiers. Additionally, we implemented an algorithm for typos and misspelling adjustment based on the Levenshtein distance to the Spanish Billion Word Corpus dictionary. We analyzed 43,529 documents constructed by an average of 4.15 progress notes from 22,937 patients. The pipeline with the best performance was the one that included Word2Vec, typos and spelling adjustment, document representation by PCA, and an SVM classifier. We found that it is possible to identify conditions such as miscarriage complication or hypertensive disorders from clinical notes written in Spanish, with a true positive rate higher than 0.85. This is the first approach to classify SMM from the unstructured text contained in the maternal EHRs, which can contribute to the solution of one of the most important public health problems in the world. Future works must test other representation and classification approaches to detect the risk of SMM.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Psychosis Relapse Prediction Leveraging Electronic Health Records Data and Natural Language Processing Enrichment Methods
    Lee, Dong Yun
    Kim, Chungsoo
    Lee, Seongwon
    Son, Sang Joon
    Cho, Sun-Mi
    Cho, Yong Hyuk
    Lim, Jaegyun
    Park, Rae Woong
    FRONTIERS IN PSYCHIATRY, 2022, 13
  • [42] Using natural language processing of clinical text to enhance identification of opioid-related overdoses in electronic health records data
    Hazlehurst, Brian
    Green, Carla A.
    Perrin, Nancy A.
    Brandes, John
    Carrell, David S.
    Baer, Andrew
    DeVeaugh-Geiss, Angela
    Coplan, Paul M.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2019, 28 (08) : 1143 - 1151
  • [43] Distinguishing cardiac catheter ablation energy modalities by applying natural language processing to electronic health records
    Margetta, Jamie
    Sale, Alicia
    JOURNAL OF COMPARATIVE EFFECTIVENESS RESEARCH, 2024, 13 (03)
  • [44] Detecting inpatient falls by using natural language processing of electronic medical records
    Toyabe, Shin-ichi
    BMC HEALTH SERVICES RESEARCH, 2012, 12
  • [45] Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing
    Sada, Yvonne
    Hou, Jason
    Richardson, Peter
    El-Serag, Hashem
    Davila, Jessica
    MEDICAL CARE, 2016, 54 (02) : E9 - E14
  • [46] Information Extraction From Electronic Health Records to Predict Readmission Following Acute Myocardial Infarction: Does Natural Language Processing Using Clinical Notes Improve Prediction of Readmission?
    Brown, Jeremiah R.
    Ricket, Iben M.
    Reeves, Ruth M.
    Shah, Rashmee U.
    Goodrich, Christine A.
    Gobbel, Glen
    Stabler, Meagan E.
    Perkins, Amy M.
    Minter, Freneka
    Cox, Kevin C.
    Dorn, Chad
    Denton, Jason
    Bray, Bruce E.
    Gouripeddi, Ramkiran
    Higgins, John
    Chapman, Wendy W.
    MacKenzie, Todd
    Matheny, Michael E.
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2022, 11 (07):
  • [47] Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review
    Sim, Jin-ah
    Huang, Xiaolei
    Horan, Madeline R.
    Stewart, Christopher M.
    Robison, Leslie L.
    Hudson, Melissa M.
    Baker, Justin N.
    Huang, I-Chan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 146
  • [48] Detecting inpatient falls by using natural language processing of electronic medical records
    Shin-ichi Toyabe
    BMC Health Services Research, 12
  • [49] Keyword Extraction Algorithm for Classifying Smoking Status from Unstructured Bilingual Electronic Health Records Based on Natural Language Processing
    Bae, Ye Seul
    Kim, Kyung Hwan
    Kim, Han Kyul
    Choi, Sae Won
    Ko, Taehoon
    Seo, Hee Hwa
    Lee, Hae-Young
    Jeon, Hyojin
    APPLIED SCIENCES-BASEL, 2021, 11 (19):
  • [50] Risk prediction using natural language processing of electronic mental health records in an inpatient forensic psychiatry setting
    Duy Van Le
    Montgomery, James
    Kirkby, Kenneth C.
    Scanlan, Joel
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 86 : 49 - 58