Genotyping Array Design and Data Quality Control in the Million Veteran Program

被引:105
作者
Hunter-Zinck, Haley [1 ]
Shi, Yunling [1 ]
Li, Man [1 ,2 ]
Gorman, Bryan R. [1 ,3 ]
Ji, Sun-Gou [1 ,4 ,15 ]
Sun, Ning [5 ,6 ]
Webster, Teresa [7 ]
Liem, Andrew [1 ,3 ]
Hsieh, Paul [1 ]
Devineni, Poornima [1 ]
Karnam, Purushotham [1 ]
Gong, Xin [1 ]
Radhakrishnan, Lakshmi [7 ]
Schmidt, Jeanette [7 ]
Assimes, Themistocles L. [8 ,9 ]
Huang, Jie [1 ]
Pan, Cuiping [8 ,9 ]
Humphries, Donald [1 ]
Brophy, Mary [1 ]
Moser, Jennifer [10 ]
Muralidhar, Sumitra [10 ]
Huang, Grant D. [10 ]
Przygodzki, Ronald [10 ]
Concato, John [5 ,6 ,16 ,17 ]
Gaziano, John M. [1 ,11 ,12 ]
Gelernter, Joel [5 ,6 ]
O'Donnell, Christopher J. [1 ]
Hauser, Elizabeth R. [13 ,14 ]
Zhao, Hongyu [5 ,6 ]
O'Leary, Timothy J. [10 ]
Tsao, Philip S. [8 ,9 ]
Pyarajan, Saiju [1 ,11 ,12 ]
机构
[1] VA Boston Healthcare Syst, VA Cooperat Studies Program, Boston, MA 02130 USA
[2] Univ Utah, Sch Med, Dept Internal Med, Salt Lake City, UT 84132 USA
[3] Booz Allen Hamilton, Mclean, VA 22102 USA
[4] Seven Bridges, Boston, MA 02129 USA
[5] VA Connecticut Healthcare Syst, VA Cooperat Studies Program, West Haven, CT 06516 USA
[6] Yale Univ, Sch Med, New Haven, CT 06510 USA
[7] Thermo Fisher Sci, Santa Clara, CA 95054 USA
[8] VA Palo Alto Hlth Care Syst, Palo Alto, CA 94304 USA
[9] Stanford Univ, Dept Med, Sch Med, Stanford, CA 94305 USA
[10] Vet Hlth Adm, Off Res & Dev, Washington, DC 20571 USA
[11] Brigham & Womens Hosp, Dept Med, 75 Francis St, Boston, MA 02115 USA
[12] Harvard Sch Med, Boston, MA 02115 USA
[13] Durham VA Hlth Syst, Durham, NC 27705 USA
[14] Duke Univ, Dept Med, Durham, NC 27617 USA
[15] Bridgebio Pharma, Palo Alto, CA USA
[16] Yale Univ, Dept Med, New Haven, CT 06520 USA
[17] US FDA, Ctr Drug Evaluat & Res, Silver Spring, MD USA
关键词
GENETIC EPIDEMIOLOGY RESEARCH; ADULT HEALTH; UK BIOBANK; ANCESTRY; RESOURCE;
D O I
10.1016/j.ajhg.2020.03.004
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The Million Veteran Program (MVP), initiated by the Department of Veterans Affairs (VA), aims to collect biosamples with consent from at least one million veterans. Presently, blood samples have been collected from over 800,000 enrolled participants. The size and diversity of the MVP cohort, as well as the availability of extensive VA electronic health records, make it a promising resource for precision medicine. MVP is conducting array-based genotyping to provide a genome-wide scan of the entire cohort, in parallel with wholegenome sequencing, methylation, and other 'omics assays. Here, we present the design and performance of the MVP 1.0 custom Axiom array, which was designed and developed as a single assay to be used across the multi-ethnic MVP cohort. A unified genetic quality-control analysis was developed and conducted on an initial tranche of 485,856 individuals, leading to a high-quality dataset of 459,777 unique individuals. 668,418 genetic markers passed quality control and showed high-quality genotypes not only on common variants but also on rare variants. We confirmed that, with non-European individuals making up nearly 30%, MVP's substantial ancestral diversity surpasses that of other large biobanks. We also demonstrated the quality of the MVP dataset by replicating established genetic associations with height in European Americans and African Americans ancestries. This current dataset has been made available to approved MVP researchers for genome-wide association studies and other downstream analyses. Further data releases will be available for analysis as recruitment at the VA continues and the cohort expands both in size and diversity.
引用
收藏
页码:535 / 548
页数:14
相关论文
共 32 条
  • [1] Affimetrix, 2016, AX GEN SOL DAT AN GU
  • [2] Fast model-based estimation of ancestry in unrelated individuals
    Alexander, David H.
    Novembre, John
    Lange, Kenneth
    [J]. GENOME RESEARCH, 2009, 19 (09) : 1655 - 1664
  • [3] Hundreds of variants clustered in genomic loci and biological pathways affect human height
    Allen, Hana Lango
    Estrada, Karol
    Lettre, Guillaume
    Berndt, Sonja I.
    Weedon, Michael N.
    Rivadeneira, Fernando
    Willer, Cristen J.
    Jackson, Anne U.
    Vedantam, Sailaja
    Raychaudhuri, Soumya
    Ferreira, Teresa
    Wood, Andrew R.
    Weyant, Robert J.
    Segre, Ayellet V.
    Speliotes, Elizabeth K.
    Wheeler, Eleanor
    Soranzo, Nicole
    Park, Ju-Hyun
    Yang, Jian
    Gudbjartsson, Daniel
    Heard-Costa, Nancy L.
    Randall, Joshua C.
    Qi, Lu
    Smith, Albert Vernon
    Maegi, Reedik
    Pastinen, Tomi
    Liang, Liming
    Heid, Iris M.
    Luan, Jian'an
    Thorleifsson, Gudmar
    Winkler, Thomas W.
    Goddard, Michael E.
    Lo, Ken Sin
    Palmer, Cameron
    Workalemahu, Tsegaselassie
    Aulchenko, Yurii S.
    Johansson, Asa
    Zillikens, M. Carola
    Feitosa, Mary F.
    Esko, Tonu
    Johnson, Toby
    Ketkar, Shamika
    Kraft, Peter
    Mangino, Massimo
    Prokopenko, Inga
    Absher, Devin
    Albrecht, Eva
    Ernst, Florian
    Glazer, Nicole L.
    Hayward, Caroline
    [J]. NATURE, 2010, 467 (7317) : 832 - 838
  • [4] A global reference for human genetic variation
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Wang, Jun
    Wilson, Richard K.
    Boerwinkle, Eric
    Doddapaneni, Harsha
    Han, Yi
    Korchina, Viktoriya
    Kovar, Christie
    Lee, Sandra
    Muzny, Donna
    Reid, Jeffrey G.
    Zhu, Yiming
    Chang, Yuqi
    Feng, Qiang
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Lan, Tianming
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Liu, Shengmao
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Tang, Meifang
    Wang, Bo
    [J]. NATURE, 2015, 526 (7571) : 68 - +
  • [5] Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort
    Banda, Yambazi
    Kvale, Mark N.
    Hoffmann, Thomas J.
    Hesselson, Stephanie E.
    Ranatunga, Dilrini
    Tang, Hua
    Sabatti, Chiara
    Croen, Lisa A.
    Dispensa, Brad P.
    Henderson, Mary
    Iribarren, Carlos
    Jorgenson, Eric
    Kushi, Lawrence H.
    Ludwig, Dana
    Olberg, Diane
    Quesenberry, Charles P., Jr.
    Rowell, Sarah
    Sadler, Marianne
    Sakoda, Lori C.
    Sciortino, Stanley
    Shen, Ling
    Smethurst, David
    Somkin, Carol P.
    Van Den Eeden, Stephen K.
    Walter, Lawrence
    Whitmer, Rachel A.
    Kwok, Pui-Yan
    Schaefer, Catherine
    Risch, Neil
    [J]. GENETICS, 2015, 200 (04) : 1285 - +
  • [6] LD Score regression distinguishes confounding from polygenicity in genome-wide association studies
    Bulik-Sullivan, Brendan K.
    Loh, Po-Ru
    Finucane, Hilary K.
    Ripke, Stephan
    Yang, Jian
    Patterson, Nick
    Daly, Mark J.
    Price, Alkes L.
    Neale, Benjamin M.
    [J]. NATURE GENETICS, 2015, 47 (03) : 291 - +
  • [7] The UK Biobank resource with deep phenotyping and genomic data
    Bycroft, Clare
    Freeman, Colin
    Petkova, Desislava
    Band, Gavin
    Elliott, Lloyd T.
    Sharp, Kevin
    Motyer, Allan
    Vukcevic, Damjan
    Delaneau, Olivier
    O'Connell, Jared
    Cortes, Adrian
    Welsh, Samantha
    Young, Alan
    Effingham, Mark
    McVean, Gil
    Leslie, Stephen
    Allen, Naomi
    Donnelly, Peter
    Marchini, Jonathan
    [J]. NATURE, 2018, 562 (7726) : 203 - +
  • [8] C.I.A, 2017, CENTR INT AG WORLD F
  • [9] Second-generation PLINK: rising to the challenge of larger and richer datasets
    Chang, Christopher C.
    Chow, Carson C.
    Tellier, Laurent C. A. M.
    Vattikuti, Shashaank
    Purcell, Shaun M.
    Lee, James J.
    [J]. GIGASCIENCE, 2015, 4
  • [10] Dai C.L., 2019, BIORXIV, DOI [10.1101/57741, DOI 10.1101/57741]