Extraction and Quantification of Words Representing Degrees of Diseases: Combining the Fuzzy C-Means Method and Gaussian Membership

被引:0
作者
Han, Feng [1 ]
Zhang, ZiHeng [1 ]
Zhang, Hongjian [2 ]
Nakaya, Jun [3 ]
Kudo, Kohsuke [4 ]
Ogasawara, Katsuhiko [2 ]
机构
[1] Hokkaido Univ, Grad Sch Med, Sapporo, Hokkaido, Japan
[2] Hokkaido Univ, Grad Sch Hlth Sci Med Management & Informat, N12 W5, Sapporo, Hokkaido 0600812, Japan
[3] Hokkaido Univ, Grad Sch Med, Div Adv Diagnost Imaging Dev, Sapporo, Hokkaido, Japan
[4] Hokkaido Univ, Fac Med, Dept Diagnost Imaging, Sapporo, Hokkaido, Japan
基金
日本学术振兴会;
关键词
medical text; fuzzy c-means; cluster; algorithm; machine learning; word quantification; fuzzification; Gauss; radiology; medical report; documentation; text mining; data mining; extraction; unstructured; free text; quantification; fuzzy; diagnosis; diagnostic; EHR; support system; CARE;
D O I
10.2196/38677
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Due to the development of medical data, a large amount of clinical data has been generated. These unstructured data contain substantial information. Extracting useful knowledge from this data and making scientific decisions for diagnosing and treating diseases have become increasingly necessary. Unstructured data, such as in the Marketplace for Medical Information in Intensive Care III (MIMIC-III) data set, contain several ambiguous words that demonstrate the subjectivity of doctors, such as descriptions of patient symptoms. These data could be used to further improve the accuracy of medical diagnostic system assessments. To the best of our knowledge, there is currently no method for extracting subjective words that express the extent of these symptoms (hereinafter, "degree words"). Objective: Therefore, we propose using the fuzzy c-means (FCM) method and Gaussian membership to quantify the degree words in the clinical medical data set MIMIC-III. Methods: First, we preprocessed the 381,091 radiology reports collected in MIMIC-III, and then we used the FCM method to extract degree words from unstructured text. Thereafter, we used the Gaussian membership method to quantify the extracted degree words, which transform the fuzzy words extracted from the medical text into computer-recognizable numbers. Results: The results showed that the digitization of ambiguous words in medical texts is feasible. The words representing each degree of each disease had a range of corresponding values. Examples of membership medians were 2.971 (atelectasis), 3.121 (pneumonia), 2.899 (pneumothorax), 3.051 (pulmonary edema), and 2.435 (pulmonary embolus). Additionally, all extracted words contained the same subjective words (low, high, etc), which allows for an objective evaluation method. Furthermore, we will verify the specific impact of the quantification results of ambiguous words such as symptom words and degree words on the use of medical texts in subsequent studies. These same ambiguous words may be used as a new set of feature values to represent the disorders. Conclusions: This study proposes an innovative method for handling subjective words. We used the FCM method to extract the subjective degree words in the English-interpreted report of the MIMIC-III and then used the Gaussian functions to quantify the subjective degree words. In this method, words containing subjectivity in unstructured texts can be automatically processed and transformed into numerical ranges by digital processing. It was concluded that the digitization of ambiguous words in medical texts is feasible.
引用
收藏
页数:9
相关论文
共 37 条
[1]   A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion [J].
Ali, Farman ;
El-Sappagh, Shaker ;
Islam, S. M. Riazul ;
Kwak, Daehan ;
Ali, Amjad ;
Imran, Muhammad ;
Kwak, Kyung-Sup .
INFORMATION FUSION, 2020, 63 :208-222
[2]  
Anam SA, 2018, 2018 JOINT 7TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2018 2ND INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), P180, DOI 10.1109/ICIEV.2018.8641055
[3]  
[Anonymous], 2014, PRINCIPLES PATTERN R
[4]  
[Anonymous], AMIA ANN S P
[5]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[6]  
Chen ES, 2014, METHODS MOL BIOL, V1159, P269, DOI 10.1007/978-1-4939-0709-0_15
[7]   Using natural language processing to extract clinically useful information from Chinese electronic medical records [J].
Chen, Liang ;
Song, Liting ;
Shao, Yue ;
Li, Dewei ;
Ding, Keyue .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 124 :6-12
[8]   Developing a Surgical Site Infection Surveillance System Based on Hospital Unstructured Clinical Notes and Text Mining [J].
Ciofi Degli Atti, Marta Luisa ;
Pecoraro, Fabrizio ;
Piga, Simone ;
Luzi, Daniela ;
Raponi, Massimiliano .
SURGICAL INFECTIONS, 2020, 21 (08) :716-721
[9]   Clinical notes as prognostic markers of mortality associated with diabetes mellitus following critical care: A retrospective cohort analysis using machine learning and unstructured big data [J].
De Silva, Kushan ;
Mathews, Noel ;
Teede, Helena ;
Forbes, Andrew ;
Jonsson, Daniel ;
Demmer, Ryan T. ;
Enticott, Joanne .
COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 132
[10]   Acceptance of text-mining systems: The signaling role of information quality [J].
Demoulin, Nathalie T. M. ;
Coussement, Kristof .
INFORMATION & MANAGEMENT, 2020, 57 (01)