Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network

被引:264
作者
Newton, Katherine M. [1 ]
Peissig, Peggy L. [2 ]
Kho, Abel Ngo [3 ]
Bielinski, Suzette J. [4 ]
Berg, Richard L. [2 ]
Choudhary, Vidhu [2 ]
Basford, Melissa [5 ]
Chute, Christopher G. [6 ]
Kullo, Iftikhar J. [7 ]
Li, Rongling [8 ]
Pacheco, Jennifer A. [3 ]
Rasmussen, Luke V. [3 ]
Spangler, Leslie [1 ]
Denny, Joshua C. [9 ,10 ]
机构
[1] Grp Hlth Res Inst, Seattle, WA 98101 USA
[2] Marshfield Clin Res Fdn, Dept Biomed Informat, Marshfield, WI USA
[3] Northwestern Univ, Feinberg Sch Med, Chicago, IL 60611 USA
[4] Mayo Clin, Dept Hlth Sci Res, Dept Epidemiol, Rochester, MN USA
[5] Vanderbilt Univ, Off Personalized Med, Nashville, TN USA
[6] Mayo Clin, Dept Hlth Sci Res, Div Biomed Stat & Informat, Rochester, MN USA
[7] Mayo Clin, Div Cardiovasc Dis, Rochester, MN USA
[8] NHGRI, Off Populat Genom, Bethesda, MD 20892 USA
[9] Vanderbilt Univ, Dept Biomed Informat, Nashville, TN USA
[10] Vanderbilt Univ, Dept Med, Nashville, TN USA
关键词
GENOME-WIDE ASSOCIATION; HEALTH RECORDS; VARIANTS; DEFINITION;
D O I
10.1136/amiajnl-2012-000896
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats. Objective To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies. Materials and methods The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University. Results By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results. Conclusions Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.
引用
收藏
页码:E147 / E154
页数:8
相关论文
共 24 条
[1]   Size matters: just how big is BIG? Quantifying realistic sample size requirements for human genome epidemiology [J].
Burton, Paul R. ;
Hansell, Anna L. ;
Fortier, Isabel ;
Manolio, Teri A. ;
Khoury, Muin J. ;
Little, Julian ;
Elliott, Paul .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2009, 38 (01) :263-273
[2]   Portability of an algorithm to identify rheumatoid arthritis in electronic health records [J].
Carroll, Robert J. ;
Thompson, Will K. ;
Eyler, Anne E. ;
Mandelin, Arthur M. ;
Cai, Tianxi ;
Zink, Raquel M. ;
Pacheco, Jennifer A. ;
Boomershine, Chad S. ;
Lasko, Thomas A. ;
Xu, Hua ;
Karlson, Elizabeth W. ;
Perez, Raul G. ;
Gainer, Vivian S. ;
Murphy, Shawn N. ;
Ruderman, Eric M. ;
Pope, Richard M. ;
Plenge, Robert M. ;
Kho, Abel Ngo ;
Liao, Katherine P. ;
Denny, Joshua C. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (E1) :E162-E169
[3]  
Conway Mike, 2011, AMIA Annu Symp Proc, V2011, P274
[4]   Genetic variants associated with the white blood cell count in 13,923 subjects in the eMERGE Network [J].
Crosslin, David R. ;
McDavid, Andrew ;
Weston, Noah ;
Nelson, Sarah C. ;
Zheng, Xiuwen ;
Hart, Eugene ;
de Andrade, Mariza ;
Kullo, Iftikhar J. ;
McCarty, Catherine A. ;
Doheny, Kimberly F. ;
Pugh, Elizabeth ;
Kho, Abel ;
Hayes, M. Geoffrey ;
Pretel, Stephanie ;
Saip, Alexander ;
Ritchie, Marylyn D. ;
Crawford, Dana C. ;
Crane, Paul K. ;
Newton, Katherine ;
Li, Rongling ;
Mirel, Daniel B. ;
Crenshaw, Andrew ;
Larson, Eric B. ;
Carlson, Chris S. ;
Jarvik, Gail P. .
HUMAN GENETICS, 2012, 131 (04) :639-652
[5]   Variants Near FOXE1 Are Associated with Hypothyroidism and Other Thyroid Conditions: Using Electronic Medical Records for Genome- and Phenome-wide Studies [J].
Denny, Joshua C. ;
Crawford, Dana C. ;
Ritchie, Marylyn D. ;
Bielinski, Suzette J. ;
Basford, Melissa A. ;
Bradford, Yuki ;
Chai, High Seng ;
Bastarache, Lisa ;
Zuvich, Rebecca ;
Peissig, Peggy ;
Carrell, David ;
Ramirez, Andrea H. ;
Pathak, Jyotishman ;
Wilke, Russell A. ;
Rasmussen, Luke ;
Wang, Xiaoming ;
Pacheco, Jennifer A. ;
Kho, Abel N. ;
Hayes, M. Geoffrey ;
Weston, Noah ;
Matsumoto, Martha ;
Kopp, Peter A. ;
Newton, Katherine M. ;
Jarvik, Gail P. ;
Li, Rongling ;
Manolio, Teri A. ;
Kullo, Iftikhar J. ;
Chute, Christopher G. ;
Chisholm, Rex L. ;
Larson, Eric B. ;
McCarty, Catherine A. ;
Masys, Daniel R. ;
Roden, Dan M. ;
de Andrade, Mariza .
AMERICAN JOURNAL OF HUMAN GENETICS, 2011, 89 (04) :529-542
[6]   Identification of Genomic Predictors of Atrioventricular Conduction Using Electronic Medical Records as a Tool for Genome Science [J].
Denny, Joshua C. ;
Ritchie, Marylyn D. ;
Crawford, Dana C. ;
Schildcrout, Jonathan S. ;
Ramirez, Andrea H. ;
Pulley, Jill M. ;
Basford, Melissa A. ;
Masys, Daniel R. ;
Haines, Jonathan L. ;
Roden, Dan M. .
CIRCULATION, 2010, 122 (20) :2016-2021
[7]   Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies [J].
Edwards, BJ ;
Haynes, C ;
Levenstien, MA ;
Finch, SJ ;
Gordon, D .
BMC GENETICS, 2005, 6 (1)
[8]   The PhenX Toolkit: Get the Most From Your Measures [J].
Hamilton, Carol M. ;
Strader, Lisa C. ;
Pratt, Joseph G. ;
Maiese, Deborah ;
Hendershot, Tabitha ;
Kwok, Richard K. ;
Hammond, Jane A. ;
Huggins, Wayne ;
Jackman, Dean ;
Pan, Huaqin ;
Nettles, Destiney S. ;
Beaty, Terri H. ;
Farrer, Lindsay A. ;
Kraft, Peter ;
Marazita, Mary L. ;
Ordovas, Jose M. ;
Pato, Carlos N. ;
Spitz, Margaret R. ;
Wagener, Diane ;
Williams, Michelle ;
Junkins, Heather A. ;
Harlan, William R. ;
Ramos, Erin M. ;
Haines, Jonathan .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2011, 174 (03) :253-260
[9]  
Hornbrook Mark C, 2005, J Natl Cancer Inst Monogr, P12
[10]   Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study [J].
Kho, Abel N. ;
Hayes, M. Geoffrey ;
Rasmussen-Torvik, Laura ;
Pacheco, Jennifer A. ;
Thompson, William K. ;
Armstrong, Loren L. ;
Denny, Joshua C. ;
Peissig, Peggy L. ;
Miller, Aaron W. ;
Wei, Wei-Qi ;
Bielinski, Suzette J. ;
Chute, Christopher G. ;
Leibson, Cynthia L. ;
Jarvik, Gail P. ;
Crosslin, David R. ;
Carlson, Christopher S. ;
Newton, Katherine M. ;
Wolf, Wendy A. ;
Chisholm, Rex L. ;
Lowe, William L. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (02) :212-218