Genotyping Array Design and Data Quality Control in the Million Veteran Program

被引:120
作者
Hunter-Zinck, Haley [1 ]
Shi, Yunling [1 ]
Li, Man [1 ,2 ]
Gorman, Bryan R. [1 ,3 ]
Ji, Sun-Gou [1 ,4 ,15 ]
Sun, Ning [5 ,6 ]
Webster, Teresa [7 ]
Liem, Andrew [1 ,3 ]
Hsieh, Paul [1 ]
Devineni, Poornima [1 ]
Karnam, Purushotham [1 ]
Gong, Xin [1 ]
Radhakrishnan, Lakshmi [7 ]
Schmidt, Jeanette [7 ]
Assimes, Themistocles L. [8 ,9 ]
Huang, Jie [1 ]
Pan, Cuiping [8 ,9 ]
Humphries, Donald [1 ]
Brophy, Mary [1 ]
Moser, Jennifer [10 ]
Muralidhar, Sumitra [10 ]
Huang, Grant D. [10 ]
Przygodzki, Ronald [10 ]
Concato, John [5 ,6 ,16 ,17 ]
Gaziano, John M. [1 ,11 ,12 ]
Gelernter, Joel [5 ,6 ]
O'Donnell, Christopher J. [1 ]
Hauser, Elizabeth R. [13 ,14 ]
Zhao, Hongyu [5 ,6 ]
O'Leary, Timothy J. [10 ]
Tsao, Philip S. [8 ,9 ]
Pyarajan, Saiju [1 ,11 ,12 ]
机构
[1] VA Boston Healthcare Syst, VA Cooperat Studies Program, Boston, MA 02130 USA
[2] Univ Utah, Sch Med, Dept Internal Med, Salt Lake City, UT 84132 USA
[3] Booz Allen Hamilton, Mclean, VA 22102 USA
[4] Seven Bridges, Boston, MA 02129 USA
[5] VA Connecticut Healthcare Syst, VA Cooperat Studies Program, West Haven, CT 06516 USA
[6] Yale Univ, Sch Med, New Haven, CT 06510 USA
[7] Thermo Fisher Sci, Santa Clara, CA 95054 USA
[8] VA Palo Alto Hlth Care Syst, Palo Alto, CA 94304 USA
[9] Stanford Univ, Dept Med, Sch Med, Stanford, CA 94305 USA
[10] Vet Hlth Adm, Off Res & Dev, Washington, DC 20571 USA
[11] Brigham & Womens Hosp, Dept Med, 75 Francis St, Boston, MA 02115 USA
[12] Harvard Sch Med, Boston, MA 02115 USA
[13] Durham VA Hlth Syst, Durham, NC 27705 USA
[14] Duke Univ, Dept Med, Durham, NC 27617 USA
[15] Bridgebio Pharma, Palo Alto, CA USA
[16] Yale Univ, Dept Med, New Haven, CT 06520 USA
[17] US FDA, Ctr Drug Evaluat & Res, Silver Spring, MD USA
关键词
GENETIC EPIDEMIOLOGY RESEARCH; ADULT HEALTH; UK BIOBANK; ANCESTRY; RESOURCE;
D O I
10.1016/j.ajhg.2020.03.004
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The Million Veteran Program (MVP), initiated by the Department of Veterans Affairs (VA), aims to collect biosamples with consent from at least one million veterans. Presently, blood samples have been collected from over 800,000 enrolled participants. The size and diversity of the MVP cohort, as well as the availability of extensive VA electronic health records, make it a promising resource for precision medicine. MVP is conducting array-based genotyping to provide a genome-wide scan of the entire cohort, in parallel with wholegenome sequencing, methylation, and other 'omics assays. Here, we present the design and performance of the MVP 1.0 custom Axiom array, which was designed and developed as a single assay to be used across the multi-ethnic MVP cohort. A unified genetic quality-control analysis was developed and conducted on an initial tranche of 485,856 individuals, leading to a high-quality dataset of 459,777 unique individuals. 668,418 genetic markers passed quality control and showed high-quality genotypes not only on common variants but also on rare variants. We confirmed that, with non-European individuals making up nearly 30%, MVP's substantial ancestral diversity surpasses that of other large biobanks. We also demonstrated the quality of the MVP dataset by replicating established genetic associations with height in European Americans and African Americans ancestries. This current dataset has been made available to approved MVP researchers for genome-wide association studies and other downstream analyses. Further data releases will be available for analysis as recruitment at the VA continues and the cohort expands both in size and diversity.
引用
收藏
页码:535 / 548
页数:14
相关论文
共 32 条
[1]  
Affimetrix, 2016, AX GEN SOL DAT AN GU
[2]   Fast model-based estimation of ancestry in unrelated individuals [J].
Alexander, David H. ;
Novembre, John ;
Lange, Kenneth .
GENOME RESEARCH, 2009, 19 (09) :1655-1664
[3]   Hundreds of variants clustered in genomic loci and biological pathways affect human height [J].
Allen, Hana Lango ;
Estrada, Karol ;
Lettre, Guillaume ;
Berndt, Sonja I. ;
Weedon, Michael N. ;
Rivadeneira, Fernando ;
Willer, Cristen J. ;
Jackson, Anne U. ;
Vedantam, Sailaja ;
Raychaudhuri, Soumya ;
Ferreira, Teresa ;
Wood, Andrew R. ;
Weyant, Robert J. ;
Segre, Ayellet V. ;
Speliotes, Elizabeth K. ;
Wheeler, Eleanor ;
Soranzo, Nicole ;
Park, Ju-Hyun ;
Yang, Jian ;
Gudbjartsson, Daniel ;
Heard-Costa, Nancy L. ;
Randall, Joshua C. ;
Qi, Lu ;
Smith, Albert Vernon ;
Maegi, Reedik ;
Pastinen, Tomi ;
Liang, Liming ;
Heid, Iris M. ;
Luan, Jian'an ;
Thorleifsson, Gudmar ;
Winkler, Thomas W. ;
Goddard, Michael E. ;
Lo, Ken Sin ;
Palmer, Cameron ;
Workalemahu, Tsegaselassie ;
Aulchenko, Yurii S. ;
Johansson, Asa ;
Zillikens, M. Carola ;
Feitosa, Mary F. ;
Esko, Tonu ;
Johnson, Toby ;
Ketkar, Shamika ;
Kraft, Peter ;
Mangino, Massimo ;
Prokopenko, Inga ;
Absher, Devin ;
Albrecht, Eva ;
Ernst, Florian ;
Glazer, Nicole L. ;
Hayward, Caroline .
NATURE, 2010, 467 (7317) :832-838
[4]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[5]   Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort [J].
Banda, Yambazi ;
Kvale, Mark N. ;
Hoffmann, Thomas J. ;
Hesselson, Stephanie E. ;
Ranatunga, Dilrini ;
Tang, Hua ;
Sabatti, Chiara ;
Croen, Lisa A. ;
Dispensa, Brad P. ;
Henderson, Mary ;
Iribarren, Carlos ;
Jorgenson, Eric ;
Kushi, Lawrence H. ;
Ludwig, Dana ;
Olberg, Diane ;
Quesenberry, Charles P., Jr. ;
Rowell, Sarah ;
Sadler, Marianne ;
Sakoda, Lori C. ;
Sciortino, Stanley ;
Shen, Ling ;
Smethurst, David ;
Somkin, Carol P. ;
Van Den Eeden, Stephen K. ;
Walter, Lawrence ;
Whitmer, Rachel A. ;
Kwok, Pui-Yan ;
Schaefer, Catherine ;
Risch, Neil .
GENETICS, 2015, 200 (04) :1285-+
[6]   LD Score regression distinguishes confounding from polygenicity in genome-wide association studies [J].
Bulik-Sullivan, Brendan K. ;
Loh, Po-Ru ;
Finucane, Hilary K. ;
Ripke, Stephan ;
Yang, Jian ;
Patterson, Nick ;
Daly, Mark J. ;
Price, Alkes L. ;
Neale, Benjamin M. .
NATURE GENETICS, 2015, 47 (03) :291-+
[7]   The UK Biobank resource with deep phenotyping and genomic data [J].
Bycroft, Clare ;
Freeman, Colin ;
Petkova, Desislava ;
Band, Gavin ;
Elliott, Lloyd T. ;
Sharp, Kevin ;
Motyer, Allan ;
Vukcevic, Damjan ;
Delaneau, Olivier ;
O'Connell, Jared ;
Cortes, Adrian ;
Welsh, Samantha ;
Young, Alan ;
Effingham, Mark ;
McVean, Gil ;
Leslie, Stephen ;
Allen, Naomi ;
Donnelly, Peter ;
Marchini, Jonathan .
NATURE, 2018, 562 (7726) :203-+
[8]  
C.I.A, 2017, CENTR INT AG WORLD F
[9]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[10]  
Dai C.L., 2019, BIORXIV, DOI [10.1101/57741, DOI 10.1101/57741]