Classification of Severe Maternal Morbidity from Electronic Health Records Written in Spanish Using Natural Language Processing

被引:4
|
作者
Torres-Silva, Ever A. [1 ]
Rua, Santiago [2 ]
Giraldo-Forero, Andres F. [1 ]
Durango, Maria C. [3 ]
Florez-Arango, Jose F. [4 ]
Orozco-Duque, Andres [3 ]
机构
[1] Inst Tecnol Metropolitano, Fac Engn, Medellin 050034, Colombia
[2] Univ Nacl Abierta & Distancia, Sch Basic Sci Technol & Engn, Bogota 111321, Colombia
[3] Inst Tecnol Metropolitano, Dept Appl Sci, Medellin 050034, Colombia
[4] Weill Cornell Med, Populat Hlth Sci, New York, NY 10065 USA
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 19期
关键词
electronic health records; machine learning; maternal health; pregnancy complications; natural language processing; word-embedding; MACHINE; EMBEDDINGS;
D O I
10.3390/app131910725
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One stepping stone for reducing the maternal mortality is to identify severe maternal morbidity (SMM) using Electronic Health Records (EHRs). We aim to develop a pipeline to represent and classify the unstructured text of maternal progress notes in eight classes according to the silver labels defined by the ICD-10 codes associated with SMM. We preprocessed the text, removing protected health information (PHI) and reducing stop words. We built different pipelines to classify the SMM by the combination of six word-embeddings schemes, three different approaches for the representation of the documents (average, clustering, and principal component analysis), and five well-known machine learning classifiers. Additionally, we implemented an algorithm for typos and misspelling adjustment based on the Levenshtein distance to the Spanish Billion Word Corpus dictionary. We analyzed 43,529 documents constructed by an average of 4.15 progress notes from 22,937 patients. The pipeline with the best performance was the one that included Word2Vec, typos and spelling adjustment, document representation by PCA, and an SVM classifier. We found that it is possible to identify conditions such as miscarriage complication or hypertensive disorders from clinical notes written in Spanish, with a true positive rate higher than 0.85. This is the first approach to classify SMM from the unstructured text contained in the maternal EHRs, which can contribute to the solution of one of the most important public health problems in the world. Future works must test other representation and classification approaches to detect the risk of SMM.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Classification of Poverty Condition Using Natural Language Processing
    Muneton-Santa, Guberney
    Escobar-Grisales, Daniel
    Orlando Lopez-Pabon, Felipe
    Perez-Toro, Paula Andrea
    Rafael Orozco-Arroyave, Juan
    SOCIAL INDICATORS RESEARCH, 2022, 162 (03) : 1413 - 1435
  • [32] Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis
    Rybinski, Maciej
    Dai, Xiang
    Singh, Sonit
    Karimi, Sarvnaz
    Nguyen, Anthony
    JMIR MEDICAL INFORMATICS, 2021, 9 (04)
  • [33] Challenges of Developing a Natural Language Processing Method With Electronic Health Records to Identify Persons With Chronic Mobility Disability
    Agaronnik, Nicole D.
    Lindvall, Charlotta
    El-Jawahri, Areej
    He, Wei
    Iezzoni, Lisa I.
    ARCHIVES OF PHYSICAL MEDICINE AND REHABILITATION, 2020, 101 (10): : 1739 - 1746
  • [34] Classification of Poverty Condition Using Natural Language Processing
    Guberney Muñetón-Santa
    Daniel Escobar-Grisales
    Felipe Orlando López-Pabón
    Paula Andrea Pérez-Toro
    Juan Rafael Orozco-Arroyave
    Social Indicators Research, 2022, 162 : 1413 - 1435
  • [35] Using natural language processing to analyze unstructured patient-reported outcomes data derived from electronic health records for cancer populations: a systematic review
    Sim, Jin-Ah
    Huang, Xiaolei
    Horan, Madeline R.
    Baker, Justin N.
    Huang, I-Chan
    EXPERT REVIEW OF PHARMACOECONOMICS & OUTCOMES RESEARCH, 2024, 24 (04) : 467 - 475
  • [36] Applying Natural Language Processing Toolkits to Electronic Health Records - An Experience Report
    Barrett, Neil
    Weber-Jahnke, Jens H.
    ADVANCES IN INFORMATION TECHNOLOGY AND COMMUNICATION IN HEALTH, 2009, 143 : 441 - 446
  • [37] Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records
    Caccamisi, Andrea
    Jorgensen, Leif
    Dalianis, Hercules
    Rosenlund, Mats
    UPSALA JOURNAL OF MEDICAL SCIENCES, 2020, 125 (04) : 316 - 324
  • [38] Automated Extraction of Pain Symptoms: A Natural Language Approach using Electronic Health Records
    Dave, Amisha D.
    Ruano, Gualberto
    Kost, Jonathan
    Wang, Xiaoyan
    PAIN PHYSICIAN, 2022, 25 (02) : E245 - E254
  • [39] Adverse Drug Reaction extraction on Electronic Health Records written in Spanish
    Santiso Gonzalez, Sara
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (64): : 119 - 122
  • [40] Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review
    Hossain, Elias
    Rana, Rajib
    Higgins, Niall
    Soar, Jeffrey
    Barua, Prabal Datta
    Pisani, Anthony R.
    Turner, Kathryn
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 155