Automated grouping of medical codes via multiview banded spectral clustering

被引:6
|
作者
Zhang, Luwan [1 ]
Zhang, Yichi [2 ]
Cai, Tianrun [3 ,4 ]
Ahuja, Yuri [1 ]
He, Zeling [3 ,4 ]
Ho, Yuk-Lam [4 ]
Beam, Andrew [5 ]
Cho, Kelly [4 ,7 ,8 ]
Carroll, Robert [6 ]
Denny, Joshua [6 ]
Kohane, Isaac [5 ]
Liao, Katherine [3 ,4 ,5 ]
Cai, Tianxi [1 ,4 ,5 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[2] Univ Rhode Isl, Dept Comp Sci & Stat, Kingston, RI 02881 USA
[3] Brigham & Womens Hosp, Div Rheumatol, 75 Francis St, Boston, MA 02115 USA
[4] VA Boston Healthcare Syst, Div Populat Hlth & Data Sci, MAVERIC, Boston, MA USA
[5] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[6] Vanderbilt Univ, Dept Biomed Informat, Nashville, TN USA
[7] Brigham & Womens Hosp, Div Aging, 75 Francis St, Boston, MA 02115 USA
[8] Harvard Med Sch, Dept Med, Boston, MA 02115 USA
关键词
Electronic health records (EHR); Data-driven grouping; Multiple data sources; International Classification of Disease (ICD); Spectral clustering; PHENOME-WIDE ASSOCIATION; BIOBANK; HEALTH;
D O I
10.1016/j.jbi.2019.103322
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: With its increasingly widespread adoption, electronic health records (EHR) have enabled phenotypic information extraction at an unprecedented granularity and scale. However, often a medical concept (e.g. diagnosis, prescription, symptom) is described in various synonyms across different EHR systems, hindering data integration for signal enhancement and complicating dimensionality reduction for knowledge discovery. Despite existing ontologies and hierarchies, tremendous human effort is needed for curation and maintenance - a process that is both unscalable and susceptible to subjective biases. This paper aims to develop a data-driven approach to automate grouping medical terms into clinically relevant concepts by combining multiple up-to-date data sources in an unbiased manner. Methods: We present a novel data-driven grouping approach - multi-view banded spectral clustering (mvBSC) combining summary data from multiple healthcare systems. The proposed method consists of a banding step that leverages the prior knowledge from the existing coding hierarchy, and a combining step that performs spectral clustering on an optimally weighted matrix. Results: We apply the proposed method to group ICD-9 and ICD-10-CM codes together by integrating data from two healthcare systems. We show grouping results and hierarchies for 13 representative disease categories. Individual grouping qualities were evaluated using normalized mutual information, adjusted Rand index, and F-1-measure, and were found to consistently exhibit great similarity to the existing manual grouping counterpart. The resulting ICD groupings also enjoy comparable interpretability and are well aligned with the current ICD hierarchy. Conclusion: The proposed approach, by systematically leveraging multiple data sources, is able to overcome bias while maximizing consensus to achieve generalizability. It has the advantage of being efficient, scalable, and adaptive to the evolving human knowledge reflected in the data, showing a significant step toward automating medical knowledge integration.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Multiview Spectral Clustering via Ensemble
    Cheng, Yong
    Zhao, Ruilian
    2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 101 - 106
  • [2] Multiview spectral clustering via complementary information
    Ma, Shuangxun
    Liu, Yuehu
    Zheng, Qinghai
    Li, Yaochen
    Cui, Zhichao
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (15):
  • [3] Multiview Subspace Clustering With Grouping Effect
    Chen, Man-Sheng
    Huang, Ling
    Wang, Chang-Dong
    Huang, Dong
    Yu, Philip S.
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 7655 - 7668
  • [4] Continual Multiview Spectral Clustering via Multilevel Knowledge
    Wang, Kangru
    Wang, Lei
    Zhang, Xiaolin
    Li, Jiamao
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1555 - 1559
  • [5] Multiview Spectral Clustering via Robust Subspace Segmentation
    Pan, Yan
    Huang, Chang-Qin
    Wang, Dianhui
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (04) : 2467 - 2476
  • [6] Multiview Tensor Spectral Clustering via Co-Regularization
    Cai, Hongmin
    Wang, Yu
    Qi, Fei
    Wang, Zhuoyao
    Cheung, Yiu-ming
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (10) : 6795 - 6808
  • [7] Multiview Approach to Spectral Clustering
    Kanaan-Izquierdo, Samir
    Ziyatdinov, Andrey
    Massanet, Raimon
    Perera, Alexandre
    2012 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2012, : 1254 - 1257
  • [8] Multiview Spectral Clustering With Bipartite Graph
    Yang, Haizhou
    Gao, Quanxue
    Xia, Wei
    Yang, Ming
    Gao, Xinbo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3591 - 3605
  • [9] Fast Multiview Clustering With Spectral Embedding
    Yang, Ben
    Zhang, Xuetao
    Nie, Feiping
    Wang, Fei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3884 - 3895
  • [10] Oriented grouping-constrained spectral clustering for medical imaging segmentation
    Kaijian Xia
    Xiaoqing Gu
    Yudong Zhang
    Multimedia Systems, 2020, 26 : 27 - 36