High-throughput multimodal automated phenotyping (MAP) with application to PheWAS

被引:61
作者
Liao, Katherine R. [1 ,2 ,3 ]
Sun, Jiehuan [3 ,4 ]
Cai, Tianrun A. [1 ,2 ,3 ]
Link, Nicholas [3 ]
Hong, Chuan [2 ,3 ,4 ]
Huang, Jie [2 ]
Huffman, Jennifer E. [3 ]
Gronsbell, Jessica [5 ]
Zhang, Yichi [4 ,6 ]
Ho, Yuk-Lam [3 ]
Castro, Victor [7 ]
Gainer, Vivian [7 ]
Murphy, Shawn N. [2 ,7 ,8 ]
ODonnell, Christopher J. [1 ,3 ]
Gaziano, J. Michael [1 ,2 ,3 ]
Cho, Kelly [1 ,2 ,3 ]
Szolovits, Peter [9 ]
Kohane, Isaac S. [2 ]
Yu, Sheng [10 ,11 ,12 ]
Cai, Tianxi [2 ,3 ,4 ]
机构
[1] Brigham & Womens Hosp, Div Rheumatol Immunol & Allergy, 75 Francis St, Boston, MA 02115 USA
[2] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[3] VA Boston Healthcare Syst, Div Data Sci, Boston, MA USA
[4] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
[5] Verily Life Sci, Cambridge, MA USA
[6] Univ Rhode Isl, Kingston, RI 02881 USA
[7] Partners Healthcare Syst, Summerville, MA USA
[8] Massachusetts Gen Hosp, Boston, MA 02114 USA
[9] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[10] Tsinghua Univ, Ctr Stat Sci, Beijing, Peoples R China
[11] Tsinghua Univ, Dept Ind Engn, Beijing, Peoples R China
[12] Tsinghua Univ, Inst Data Sci, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金; 美国国家卫生研究院;
关键词
High-throughput; phenotyping; PheWAS; ELECTRONIC MEDICAL-RECORDS; ICD-9-CM CODES; HEALTH; ASSOCIATION; ALGORITHMS; CLASSIFICATION; IDENTIFICATION; VALIDATION; DISEASE;
D O I
10.1093/jamia/ocz066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Materials and Methods: We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. Results: The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUC(MAP) 0.943, AUC(manual) 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. Conclusion: The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS.
引用
收藏
页码:1255 / 1262
页数:8
相关论文
共 39 条
[1]   Learning statistical models of phenotypes using noisy labeled training data [J].
Agarwal, Vibhu ;
Podchiyska, Tanya ;
Banda, Juan M. ;
Goel, Veena ;
Leung, Tiffany I. ;
Minty, Evan P. ;
Sweeney, Timothy E. ;
Gyang, Elsie ;
Shah, Nigam H. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (06) :1166-1173
[2]   Improving Case Definition of Crohn's Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing: A Novel Informatics Approach [J].
Ananthakrishnan, Ashwin N. ;
Cai, Tianxi ;
Savova, Guergana ;
Cheng, Su-Chun ;
Chen, Pei ;
Perez, Raul Guzman ;
Gainer, Vivian S. ;
Murphy, Shawn N. ;
Szolovits, Peter ;
Xia, Zongqi ;
Shaw, Stanley ;
Churchill, Susanne ;
Karlson, Elizabeth W. ;
Kohane, Isaac ;
Plenge, Robert M. ;
Liao, Katherine P. .
INFLAMMATORY BOWEL DISEASES, 2013, 19 (07) :1411-1420
[3]   Inaccuracy of the International Classification of Diseases (ICD-9-CM) in identifying the diagnosis of ischemic cerebrovascular disease [J].
Benesch, C ;
Witter, DM ;
Wilder, AL ;
Duncan, PW ;
Samsa, GP ;
Matchar, DB .
NEUROLOGY, 1997, 49 (03) :660-664
[4]   Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors [J].
Birman-Deych, E ;
Waterman, AD ;
Yan, Y ;
Nilasena, DS ;
Radford, MJ ;
Gage, BF .
MEDICAL CARE, 2005, 43 (05) :480-485
[5]   Association of Interleukin 6 Receptor Variant With Cardiovascular Disease Effects of Interleukin 6 Receptor Blocking Therapy A Phenome-Wide Association Study [J].
Cai, Tianxi ;
Zhang, Yichi ;
Ho, Yuk-Lam ;
Link, Nicholas ;
Sun, Jiehuan ;
Huang, Jie ;
Cai, Tianrun A. ;
Damrauer, Scott ;
Ahuja, Yuri ;
Honerlaw, Jacqueline ;
Costa, Lauren ;
Schubert, Petra ;
Hong, Chuan ;
Gagnon, David ;
Sun, Yan, V ;
Gaziano, J. Michael ;
Wilson, Peter ;
Cho, Kelly ;
Tsao, Philip ;
O'Donnell, Christopher J. ;
Liao, Katherine P. .
JAMA CARDIOLOGY, 2018, 3 (09) :849-857
[6]   Identification of subjects with polycystic ovary syndrome using electronic health records [J].
Castro, Victor ;
Shen, Yuanyuan ;
Yu, Sheng ;
Finan, Sean ;
Pau, Cindy Ta ;
Gainer, Vivian ;
Keefe, Candace C. ;
Savova, Guergana ;
Murphy, Shawn N. ;
Cai, Tianxi ;
Welt, Corrine K. .
REPRODUCTIVE BIOLOGY AND ENDOCRINOLOGY, 2015, 13
[7]  
Castro VM, 2017, NEUROLOGY, V88, P164, DOI 10.1212/WNL.0000000000003490
[8]   Validation of Electronic Health Record Phenotyping of Bipolar Disorder Cases and Controls [J].
Castro, Victor M. ;
Minnier, Jessica ;
Murphy, Shawn N. ;
Kohane, Isaac ;
Churchill, Susanne E. ;
Gainer, Vivian ;
Cai, Tianxi ;
Hoffnagle, Alison G. ;
Dai, Yael ;
Block, Stefanie ;
Weill, Sydney R. ;
Nadal-Vicens, Mireya ;
Pollastri, Alisha R. ;
Rosenquist, J. Niels ;
Goryachev, Sergey ;
Ongur, Dost ;
Sklar, Pamela ;
Perlis, Roy H. ;
Smoller, Jordan W. .
AMERICAN JOURNAL OF PSYCHIATRY, 2015, 172 (04) :363-372
[9]   Applying active learning to high-throughput phenotyping algorithms for electronic health records data [J].
Chen, Yukun ;
Carroll, Robert J. ;
Hinz, Eugenia R. McPeek ;
Shah, Anushi ;
Eyler, Anne E. ;
Denny, Joshua C. ;
Xu, Hua .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (E2) :E253-E259
[10]   EHR-based phenotyping: Bulk learning and evaluation [J].
Chiu, Po-Hsiang ;
Hripcsak, George .
JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 70 :35-51