Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data

被引:30
作者
Hong, Chuan [1 ,2 ]
Rush, Everett [3 ]
Liu, Molei [4 ]
Zhou, Doudou [5 ]
Sun, Jiehuan [6 ]
Sonabend, Aaron [4 ]
Castro, Victor M. [7 ]
Schubert, Petra [2 ]
Panickan, Vidul A. [1 ]
Cai, Tianrun [2 ,7 ]
Costa, Lauren [2 ]
He, Zeling [7 ]
Link, Nicholas [2 ]
Hauser, Ronald [8 ]
Gaziano, J. Michael [1 ,2 ,9 ]
Murphy, Shawn N. [7 ]
Ostrouchov, George [3 ]
Ho, Yuk-Lam [2 ]
Begoli, Edmon [3 ]
Lu, Junwei [2 ,4 ]
Cho, Kelly [1 ,2 ,9 ]
Liao, Katherine P. [1 ,2 ,9 ]
Cai, Tianxi [1 ,2 ,4 ]
机构
[1] Harvard Med Sch, Boston, MA 02115 USA
[2] VA Boston Healthcare Syst, Boston, MA 02130 USA
[3] Oak Ridge Natl Lab, Dept Energy, Oak Ridge, TN USA
[4] Harvard TH Chan Sch Publ Hlth, Boston, MA 02115 USA
[5] Univ Calif Davis, Davis, CA 95616 USA
[6] Univ Illinois, Chicago, IL USA
[7] Mass Gen Brigham, Boston, MA USA
[8] West Haven VA Med Ctr, West Haven, CT USA
[9] Brigham & Womens Hosp, 75 Francis St, Boston, MA 02115 USA
关键词
RXNORM;
D O I
10.1038/s41746-021-00519-z
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.
引用
收藏
页数:11
相关论文
共 36 条
[1]  
[Anonymous], 2010, CPT 2011 STANDARD ED
[2]  
[Anonymous], 2009, Clinical classifications software (CCS)
[3]  
Artetxe Mikel, 2016, Empirical Methods in Natural Language Processing (EMNLP), P2289, DOI [10.18653/v1/D16-1250, DOI 10.18653/V1/D16-1250]
[4]  
Banda Juan M, 2017, AMIA Jt Summits Transl Sci Proc, V2017, P48
[5]  
Bass E., 2017, COMP COSTS VETERANS
[6]  
Beam AL, 2020, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020, P295
[7]   EHRs connect research and practice: Where predictive modeling, artificial intelligence, and clinical decision support intersect [J].
Bennett, Casey C. ;
Doub, Thomas W. ;
Selove, Rebecca .
HEALTH POLICY AND TECHNOLOGY, 2012, 1 (02) :105-114
[8]   Utilizing RxNorm to support practical computing applications: Capturing medication history in live electronic health records [J].
Bennett, Casey C. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2012, 45 (04) :634-641
[9]  
Choi E., 2016, MED CONCEPT REPRESEN
[10]   Multi-layer Representation Learning for Medical Concepts [J].
Choi, Edward ;
Bahadori, Mohammad Taha ;
Searles, Elizabeth ;
Coffey, Catherine ;
Thompson, Michael ;
Bost, James ;
Tejedor-Sojo, Javier ;
Sun, Jimeng .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :1495-1504