Interpretable segmentation of medical free-text records based on word embeddings

被引:6
|
作者
Dobrakowski, Adam Gabriel [1 ]
Mykowiecka, Agnieszka [2 ]
Marciniak, Malgorzata [2 ]
Jaworski, Wojciech [1 ]
Biecek, Przemyslaw [1 ,3 ]
机构
[1] Univ Warsaw, Banacha 2, Warsaw, Poland
[2] Polish Acad Sci, Inst Comp Sci, Jana Kazimierza 5, Warsaw, Poland
[3] Warsaw Univ Technol, Koszykowa 75, Warsaw, Poland
关键词
Electronic health records; Natural language processing; Text clustering; Word embeddings;
D O I
10.1007/s10844-021-00659-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical free-text records store a lot of useful information that can be exploited in developing computer-supported medicine. However, extracting the knowledge from the unstructured text is difficult and depends on the language. In the paper, we apply Natural Language Processing methods to process raw medical texts in Polish and propose a new methodology for clustering of patients' visits. We (1) extract medical terminology from a corpus of free-text clinical records, (2) annotate data with medical concepts, (3) compute vector representations of medical concepts and validate them on the proposed term analogy tasks, (4) compute visit representations as vectors, (5) introduce a new method for clustering of patients' visits and (6) apply the method to a corpus of 100,000 visits. We use several approaches to visual exploration that facilitate interpretation of segments. With our method, we obtain stable and separated segments of visits which are positively validated against final medical diagnoses. In this paper we show how algorithm for segmentation of medical free-text records may be used to aid medical doctors. In addition to this, we share implementation of described methods with examples as open-source R package memr.
引用
收藏
页码:447 / 465
页数:19
相关论文
共 50 条
  • [21] Automated Misspelling Detection and Correction in Clinical Free-Text Records
    Nazir, Aiman Khan
    Zafar, Iqra
    Fatima, Alia
    Qamar, Usman
    Shaheen, Asma
    Maqbool, Bilal
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 277 - 280
  • [22] Fever detection from free-text clinical records for biosurveillance
    Chapman, WW
    Dowling, JN
    Wagner, MM
    JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (02) : 120 - 127
  • [23] Automated misspelling detection and correction in clinical free-text records
    Lai, Kenneth H.
    Topaz, Maxim
    Goss, Foster R.
    Zhou, Li
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 55 : 188 - 195
  • [24] De-identification of primary care electronic medical records free-text data in Ontario, Canada
    Tu, Karen
    Klein-Geltink, Julie
    Mitiku, Tezeta F.
    Mihai, Chiriac
    Martin, Joel
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2010, 10
  • [25] De-identification of primary care electronic medical records free-text data in Ontario, Canada
    Karen Tu
    Julie Klein-Geltink
    Tezeta F Mitiku
    Chiriac Mihai
    Joel Martin
    BMC Medical Informatics and Decision Making, 10
  • [26] Should free-text data in electronic medical records be shared for research? A citizens' jury study in the UK
    Ford, Elizabeth
    Oswald, Malcolm
    Hassan, Lamiece
    Bozentko, Kyle
    Nenadic, Goran
    Cassell, Jackie
    JOURNAL OF MEDICAL ETHICS, 2020, 46 (06) : 367 - 377
  • [27] A framework for de-identification of free-text data in electronic medical records enabling secondary use
    Mercorelli, Louis
    Nguyen, Harrison
    Gartell, Nicole
    Brookes, Martyn
    Morris, Jonathan
    Tam, Charmaine S.
    AUSTRALIAN HEALTH REVIEW, 2022, 46 (03) : 289 - 293
  • [28] The Application of Projection Word Embeddings on Medical Records Scoring System
    Lin, Chin
    Lee, Yung-Tsai
    Wu, Feng-Jen
    Lin, Shing-An
    Hsu, Chia-Jung
    Lee, Chia-Cheng
    Tsai, Dung-Jang
    Fang, Wen-Hui
    HEALTHCARE, 2021, 9 (10)
  • [29] Automating classification of free-text electronic health records for epidemiological studies
    Schuemie, Martijn J.
    Sen, Emine
    't Jong, Geert W.
    van Soest, Eva M.
    Sturkenboom, Miriam C.
    Kors, Jan A.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2012, 21 (06) : 651 - 658
  • [30] NLP STRATEGIES FOR ANALYZING FREE-TEXT PSYCHIATRIC ELECTRONIC HOSPITAL RECORDS
    De la Hoz, Juan
    Loohuis, Loes Olde
    Castano, Mauricio
    Song, Janet
    Service, Susan
    Teshiba, Terri
    Gallego, Cristian
    Sabatti, Chiara
    Escobar, Javier
    Reus, Victor
    Bui, Alex
    Bearden, Carrie E.
    Lopez-Jaramillo, Carlos
    Freimer, Nelson
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2019, 29 : S127 - S127