Improving the phenotype risk score as a scalable approach to identifying patients with Mendelian disease

被引:29
作者
Bastarache, Lisa [1 ]
Hughey, Jacob J. [1 ]
Goldstein, Jeffrey A. [2 ]
Bastraache, Julie A. [3 ,4 ,5 ]
Das, Satya [3 ]
Zaki, Neil Charles [6 ]
Zeng, Chenjie [3 ]
Tang, Leigh Anne [1 ]
Roden, Dan M. [1 ,3 ,7 ]
Denny, Joshua C. [1 ,3 ]
机构
[1] Vanderbilt Univ, Dept Biomed Informat, Med Ctr, Nashville, TN 37203 USA
[2] Northwestern Univ, Dept Pathol, Chicago, IL 60611 USA
[3] Vanderbilt Univ, Dept Med, Med Ctr, Nashville, TN 37203 USA
[4] Vanderbilt Univ, Dept Cell & Dev Biol, Med Ctr, Nashville, TN 37203 USA
[5] Vanderbilt Univ, Dept Pathol Microbiol & Immunol, Med Ctr, Nashville, TN 37203 USA
[6] Vanderbilt Univ, Med Ctr, Dept Pediat, Nashville, TN 37203 USA
[7] Vanderbilt Univ, Dept Clin Pharmacol, Med Ctr, Nashville, TN 37203 USA
关键词
Electronic health record; Data mining; Mendelian genetics; Diagnosis; RECORD DATA; ASSOCIATION; BIOBANK;
D O I
10.1093/jamia/ocz179
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: The Phenotype Risk Score (PheRS) is a method to detect Mendelian disease patterns using phenotypes from the electronic health record (EHR). We compared the performance of different approaches mapping EHR phenotypes to Mendelian disease features. Materials and Methods: PheRS utilizes Mendelian diseases descriptions annotated with Human Phenotype Ontology (HPO) terms. In previous work, we presented a map linking phecodes (based on International Classification of Diseases [ICD]-Ninth Revision) to HPO terms. For this study, we integrated ICD-Tenth Revision codes and lab data. We also created a new map between HPO terms using customized groupings of ICD codes. We compared the performance with cases and controls for 16 Mendelian diseases using 2.5 million de-identified medical records. Results: PheRS effectively distinguished cases from controls for all 15 positive controls and all approaches tested (P < 4 x 10(16)). Adding lab data led to a statistically significant improvement for 4 of 14 diseases. The custom ICD groupings improved specificity, leading to an average 8% increase for precision at 100 (-2% to 22%). Eight of 10 adults with cystic fibrosis tested had PheRS in the 95th percentile prio to diagnosis. Discussion: Both phecodes and custom ICD groupings were able to detect differences between affected cases and controls at the population level. The ICD map showed better precision for the highest scoring individuals. Adding lab data improved performance at detecting population-level differences. Conclusions: PheRS is a scalable method to study Mendelian disease at the population level using electronic health record data and can potentially be used to find patients with undiagnosed Mendelian disease.
引用
收藏
页码:1437 / 1447
页数:11
相关论文
共 25 条
[1]   McKusick's Online Mendelian Inheritance in Man (OMIM®) [J].
Amberger, Joanna ;
Bocchini, Carol A. ;
Scott, Alan F. ;
Hamosh, Ada .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D793-D796
[2]   Case 40-2018: A Woman with Recurrent Sinusitis, Cough, and Bronchiectasis [J].
Bastarache, Lisa ;
Bastarache, Julie A. ;
Denny, Joshua C. .
NEW ENGLAND JOURNAL OF MEDICINE, 2019, 380 (14) :1382-1383
[3]   Phenotype risk scores identify patients with unrecognized Mendelian disease patterns [J].
Bastarache, Lisa ;
Hughey, Jacob J. ;
Hebbring, Scott ;
Marlo, Joy ;
Zhao, Wanke ;
Ho, Wanting T. ;
Van Driest, Sara L. ;
McGregor, Tracy L. ;
Mosley, Jonathan D. ;
Wells, Quinn S. ;
Temple, Michael ;
Ramirez, Andrea H. ;
Carroll, Robert ;
Osterman, Travis ;
Edwards, Todd ;
Ruderfer, Douglas ;
Edwards, Digna R. Velez ;
Hamid, Rizwan ;
Cogan, Joy ;
Glazer, Andrew ;
Wei, Wei-Qi ;
Feng, QiPing ;
Brilliant, Murray ;
Zhao, Zhizhuang J. ;
Cox, Nancy J. ;
Roden, Dan M. ;
Denny, Joshua C. .
SCIENCE, 2018, 359 (6381) :1233-+
[4]   China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up [J].
Chen, Zhengming ;
Chen, Junshi ;
Collins, Rory ;
Guo, Yu ;
Peto, Richard ;
Wu, Fan ;
Li, Liming .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2011, 40 (06) :1652-1666
[5]   eMERGEing progress in genomics-the first seven years [J].
Crawford, Dana C. ;
Crosslin, David R. ;
Tromp, Gerard ;
Kullo, Iftikhar J. ;
Kuivaniemi, Helena ;
Hayes, M. Geoffrey ;
Denny, Joshua C. ;
Bush, William S. ;
Haines, Jonathan L. ;
Roden, Dan M. ;
McCarty, Catherine A. ;
Jarvik, Gail P. ;
Ritchie, Marylyn D. .
FRONTIERS IN GENETICS, 2014, 5
[6]   Secondary use of clinical data: The Vanderbilt approach [J].
Danciu, Ioana ;
Cowan, James D. ;
Basford, Melissa ;
Wang, Xiaoming ;
Saip, Alexander ;
Osgood, Susan ;
Shirey-Rice, Jana ;
Kirby, Jacqueline ;
Harris, Paul A. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 52 :28-35
[7]   ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis [J].
Deisseroth, Cole A. ;
Birgmeier, Johannes ;
Bodle, Ethan E. ;
Kohler, Jennefer N. ;
Matalon, Dena R. ;
Nazarenko, Yelena ;
Genetti, Casie A. ;
Brownstein, Catherine A. ;
Schmitz-Abe, Klaus ;
Schoch, Kelly ;
Cope, Heidi ;
Signer, Rebecca ;
Network, Undiagnosed Dis ;
Martinez-Agosto, Julian A. ;
Shashi, Vandana ;
Beggs, Alan H. ;
Wheeler, Matthew T. ;
Bernstein, Jonathan A. ;
Bejerano, Gill .
GENETICS IN MEDICINE, 2019, 21 (07) :1585-1593
[8]   Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data [J].
Denny, Joshua C. ;
Bastarache, Lisa ;
Ritchie, Marylyn D. ;
Carroll, Robert J. ;
Zink, Raquel ;
Mosley, Jonathan D. ;
Field, Julie R. ;
Pulley, Jill M. ;
Ramirez, Andrea H. ;
Bowton, Erica ;
Basford, Melissa A. ;
Carrell, David S. ;
Peissig, Peggy L. ;
Kho, Abel N. ;
Pacheco, Jennifer A. ;
Rasmussen, Luke V. ;
Crosslin, David R. ;
Crane, Paul K. ;
Pathak, Jyotishman ;
Bielinski, Suzette J. ;
Pendergrass, Sarah A. ;
Xu, Hua ;
Hindorff, Lucia A. ;
Li, Rongling ;
Manolio, Teri A. ;
Chute, Christopher G. ;
Chisholm, Rex L. ;
Larson, Eric B. ;
Jarvik, Gail P. ;
Brilliant, Murray H. ;
McCarty, Catherine A. ;
Kullo, Iftikhar J. ;
Haines, Jonathan L. ;
Crawford, Dana C. ;
Masys, Daniel R. ;
Roden, Dan M. .
NATURE BIOTECHNOLOGY, 2013, 31 (12) :1102-+
[9]   Diagnostic Utility of Exome Sequencing for Kidney Disease [J].
Groopman, E. E. ;
Marasa, M. ;
Cameron-Christie, S. ;
Petrovski, S. ;
Aggarwal, V. S. ;
Milo-Rasouly, H. ;
Li, Y. ;
Zhang, J. ;
Nestor, J. ;
Krithivasan, P. ;
Lam, W. Y. ;
Mitrotti, A. ;
Piva, S. ;
Kil, B. H. ;
Chatterjee, D. ;
Reingold, R. ;
Bradbury, D. ;
DiVecchia, M. ;
Snyder, H. ;
Mu, X. ;
Mehl, K. ;
Balderes, O. ;
Fasel, D. A. ;
Weng, C. ;
Radhakrishnan, J. ;
Canetta, P. ;
Appel, G. B. ;
Bomback, A. S. ;
Ahn, W. ;
Uy, N. S. ;
Alam, S. ;
Cohen, D. J. ;
Crew, R. J. ;
Dube, G. K. ;
Rao, M. K. ;
Kamalakaran, S. ;
Copeland, B. ;
Ren, Z. ;
Bridgers, J. ;
Malone, C. D. ;
Mebane, C. M. ;
Dagaonkar, N. ;
Fellstrom, B. C. ;
Haefliger, C. ;
Mohan, S. ;
Sanna-Cherchi, S. ;
Kiryluk, K. ;
Fleckner, J. ;
March, R. ;
Platt, A. .
NEW ENGLAND JOURNAL OF MEDICINE, 2019, 380 (02) :142-151
[10]   Next-generation phenotyping of electronic health records [J].
Hripcsak, George ;
Albers, David J. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (01) :117-121