Extraction, Labeling, Clustering, and Semantic Mapping of Segments From Clinical Notes

被引:3
作者
Zelina, Petr [1 ]
Halamkova, Jana [2 ,3 ]
Novacek, Vit [1 ,2 ,4 ]
机构
[1] Masaryk Univ, Fac Informat, Brno 60177, Czech Republic
[2] Masaryk Mem Canc Inst, Dept Comprehens Canc Care, Brno 65653, Czech Republic
[3] Masaryk Univ, Fac Med, Brno 60177, Czech Republic
[4] NUI Galway, Data Sci Inst, Galway H91 TK33, Ireland
关键词
Task analysis; Semantics; Feature extraction; Ontologies; Nanobioscience; Measurement; Clinical diagnosis; Text categorization; Information retrieval; NLP; EHR; clinical notes; information extraction; text classification;
D O I
10.1109/TNB.2023.3275195
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This work is motivated by the scarcity of tools for accurate, unsupervised information extraction from unstructured clinical notes in computationally underrepresented languages, such as Czech. We introduce a stepping stone to a broad array of downstream tasks such as summarisation or integration of individual patient records, extraction of structured information for national cancer registry reporting or building of semi-structured semantic patient representations that can be used for computing patient embeddings. More specifically, we present a method for unsupervised extraction of semantically-labeled textual segments from clinical notes and test it out on a dataset of Czech breast cancer patients, provided by Masaryk Memorial Cancer Institute (the largest Czech hospital specialising exclusively in oncology). Our goal was to extract, classify (i.e. label) and cluster segments of the free-text notes that correspond to specific clinical features (e.g., family background, comorbidities or toxicities). Finally, we propose a tool for computer-assisted semantic mapping of segment types to pre-defined ontologies and validate it on a downstream task of category-specific patient similarity. The presented results demonstrate the practical relevance of the proposed approach for building more sophisticated extraction and analytical pipelines deployed on Czech clinical notes.
引用
收藏
页码:781 / 788
页数:8
相关论文
共 50 条
  • [21] ProbMap: Automatically constructing design galleries through feature extraction and semantic clustering
    MacNeil, Stephen
    Ding, Zijian
    Quan, Kexin
    Huang, Ziheng
    Chen, Kenneth
    Dow, Steven P.
    ADJUNCT PROCEEDINGS OF THE 34TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, UIST 2021, 2021, : 134 - 136
  • [22] Extraction of clinical phenotypes for Alzheimer's disease dementia from clinical notes using natural language processing
    Oh, Inez Y.
    Schindler, Suzanne E.
    Ghoshal, Nupur
    Lai, Albert M.
    Payne, Philip R. O.
    Gupta, Aditi
    JAMIA OPEN, 2023, 6 (01)
  • [23] Identifying Protein Complexes From Protein-Protein Interaction Networks Based on Fuzzy Clustering and GO Semantic Information
    Pan, Xiangyu
    Hu, Lun
    Hu, Pengwei
    You, Zhu-Hong
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (05) : 2882 - 2893
  • [24] Extraction Of Adverse Events From Clinical Documents To Support Decision Making Using Semantic Preprocessing
    Gaebel, Jan
    Kolter, Till
    Arlt, Felix
    Denecke, Kerstin
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1030 - 1030
  • [25] Semantic Knowledge Extraction from Research Documents
    Upadhyay, Rishabh
    Fujii, Akihiro
    PROCEEDINGS OF THE 2016 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2016, 8 : 439 - 445
  • [26] Extraction and Classification of Semantic Data from Twitter
    Xavier, Clarissa Castella
    Souza, Marlo
    WEBMEDIA'18: PROCEEDINGS OF THE 24TH BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB, 2018, : 15 - 18
  • [27] Extraction of Semantic Features from Transaction Dialogues
    Mustapha, Aida
    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2014, 2014, 8870 : 348 - 359
  • [28] Automatic Extraction of Semantic Relations from Wikipedia
    Arnold, Patrick
    Rahm, Erhard
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2015, 24 (02)
  • [29] SEMANTIC LOCATION EXTRACTION FROM CROWDSOURCED DATA
    Koswatte, S.
    Mcdougall, K.
    Liu, X.
    XXIII ISPRS CONGRESS, COMMISSION II, 2016, 41 (B2): : 543 - 547
  • [30] Towards practical temporal relation extraction from clinical notes: an analysis of direct temporal relations
    Lee, Hee-Jin
    Zhang, Yaoyun
    Xu, Jun
    Tao, Cui
    Xu, Hua
    Jiang, Min
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1272 - 1275