ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis

被引:0
|
作者
Gan, Ziming [1 ]
Zhou, Doudou [2 ]
Rush, Everett [3 ]
Panickan, Vidul A. [4 ,5 ]
Hoe, Yuk-Lam [5 ]
Ostrouchovm, George [3 ]
Xu, Zhiwei [6 ]
Shen, Shuting [7 ]
Xiong, Xin [8 ]
Greco, Kimberly F. [8 ]
Hong, Chuan [7 ]
Bonzel, Clara-Lea [4 ]
Wend, Jun [4 ]
Costa, Lauren [5 ]
Cai, Tianrun [5 ,9 ]
Begoli, Edmon
Xiaj, Zongqi [10 ]
Gaziano, J. Michael [5 ,9 ]
Liao, Katherine P. [5 ,9 ]
Cho, Kelly [5 ,9 ]
Cai, Tianxi [4 ,5 ,8 ]
Lu, Junwei [5 ,8 ]
机构
[1] Univ Chicago, Dept Stat, 5801 S Ellis Ave, Chicago, IL 60615 USA
[2] Natl Univ Singapore, Dept Stat & Data Sci, Singapore 117546, Singapore
[3] Oak Ridge Natl Lab, Bethel Valley Rd, Oak Ridge, TN 37830 USA
[4] Harvard Med Sch, 25 Shattuck St, Boston, MA 02115 USA
[5] VA Boston Healthcare Syst, 150 S Huntington Ave, Boston, MA 02130 USA
[6] Univ Michigan, Dept Stat, 500 S State St, Ann Arbor, MI 48109 USA
[7] Duke Univ, Dept Biostat & Bioinformat, 1121 West Main St, Durham, NC 27708 USA
[8] Harvard TH Chan Sch Publ Hlth, 677 Huntington Ave, Boston, MA 02115 USA
[9] Brigham & Womens Hosp, 60 Fenwood Rd, Boston, MA 02115 USA
[10] Univ Pittsburgh, Clin & Translat Sci, 3501 Fifth Ave, Pittsburgh, PA 15260 USA
关键词
Electronic health records; Natural language processing; Representation learning; Knowledge graph; ALZHEIMER-DISEASE; IDENTIFY; MODERATE; RISK;
D O I
10.1016/j.jbi.2024.104761
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features. Methods: Using data from 12.5 million Veterans Affairs patients, ARCH first derives embedding vectors and generates similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. Next, ARCH performs a sparse embedding regression to remove indirect linkage between features to build a sparse KG. Finally, ARCH was validated on various clinical tasks, including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients. Results: ARCH produces high-quality clinical embeddings and KG for over 60,000 codified and narrative EHR concepts. The KG and embeddings are visualized in the R-shiny powered web-API.3 ARCH achieved high accuracy in detecting EHR concept relationships, with AUCs of 0.926 (codified) and 0.861 (NLP) for similar EHR concepts, and 0.810 (codified) and 0.843 (NLP) for related pairs. It detected drug side effects with a 0.723 AUC, which improved to 0.826 after fine-tuning. Using both codified and NLP features, the detection power increased significantly. Compared to other methods, ARCH has superior accuracy and enhances weakly supervised phenotyping algorithms' performance. Notably, it successfully categorized Alzheimer's patients into two subgroups with varying mortality rates. Conclusion: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.
引用
收藏
页数:11
相关论文
共 40 条
  • [1] Large-scale knowledge graph representation learning
    Badrouni, Marwa
    Katar, Chaker
    Inoubli, Wissem
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (09) : 5479 - 5499
  • [2] MMpedia: A Large-Scale Multi-modal Knowledge Graph
    Wu, Yinan
    Wu, Xiaowei
    Li, Junwen
    Zhang, Yue
    Wang, Haofen
    Du, Wen
    He, Zhidong
    Liu, Jingping
    Ruan, Tong
    SEMANTIC WEB, ISWC 2023, PT II, 2023, 14266 : 18 - 37
  • [3] AceKG: A Large-scale Knowledge Graph for Academic Data Mining
    Wang, Ruijie
    Yan, Yuchen
    Wang, Jialu
    Jia, Yuting
    Zhang, Ye
    Zhang, Weinan
    Wang, Xinbing
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1487 - 1490
  • [4] Initial antidepressant choice by non-psychiatrists: Learning from large-scale electronic health records
    Sheu, Yi-han
    Magdamo, Colin
    Miller, Matthew
    Smoller, Jordan W.
    Blacker, Deborah
    GENERAL HOSPITAL PSYCHIATRY, 2023, 81 : 22 - 31
  • [5] Richpedia: A Large-Scale, Comprehensive Multi-Modal Knowledge Graph
    Wang, Meng
    Wang, Haofen
    Qi, Guilin
    Zheng, Qiushuo
    BIG DATA RESEARCH, 2020, 22 (22)
  • [6] GIS-KG: building a large-scale hierarchical knowledge graph for geographic information science
    Du, Jiaxin
    Wang, Shaohua
    Ye, Xinyue
    Sinton, Diana S.
    Kemp, Karen
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2022, 36 (05) : 873 - 897
  • [7] CS-KG: A Large-Scale Knowledge Graph of Research Entities and Claims in Computer Science
    Sattler, Ulrike
    Hogan, Aidan
    Keet, Maria
    Presutti, Valentina
    Almeida, Joao Paulo A.
    Takeda, Hideaki
    Monnin, Pierre
    Pirro, Giuseppe
    Amato, Claudia d
    SEMANTIC WEB - ISWC 2022, 2022, 13489 : 678 - 696
  • [8] XLORE 3: A Large-Scale Multilingual Knowledge Graph from Heterogeneous Wiki Knowledge Resources
    Zeng, Kaisheng
    Jin, Hailong
    Lv, Xin
    Zh, Fangwei
    Hou, Lei
    Zhang, Yi
    Pang, Fan
    Qi, Yu
    Liu, Dingxiao
    Li, Juanzi
    Feng, Ling
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (06)
  • [9] FTRLIM: Distributed Instance Matching Framework for Large-Scale Knowledge Graph Fusion
    Zhu, Hongming
    Wang, Xiaowen
    Jiang, Yizhi
    Fan, Hongfei
    Du, Bowen
    Liu, Qin
    ENTROPY, 2021, 23 (05)
  • [10] Exploring Large-Scale Financial Knowledge Graph for SMEs Supply Chain Mining
    Li, Youru
    Zhu, Zhenfeng
    Chen, Linxun
    Yang, Bin
    Wu, Yaxi
    Guo, Xiaobo
    Han, Bing
    Zhao, Yao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (05) : 1979 - 1990