Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies

被引:153
作者
Browning, Brian L. [1 ]
Yu, Zhaoxia [2 ]
机构
[1] Univ Auckland, Dept Stat, Auckland 1142, New Zealand
[2] Univ Calif Irvine, Dept Stat, Irvine, CA 92697 USA
基金
英国惠康基金;
关键词
HIDDEN MARKOV-MODELS; SUSCEPTIBILITY LOCI; LARGE-SCALE; UNRELATED INDIVIDUALS; ARRAY DATA; INFERENCE; IMPUTATION; ALGORITHM; DISEASE; POLYMORPHISMS;
D O I
10.1016/j.ajhg.2009.11.004
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
We present a novel method for Simultaneous genotype calling and haplotype-phase inference. Our method employs the computationally efficient BEAGLE haplotype-frequency model, which can be applied to large-scale studies with millions of markers and thousands of samples. We compare genotype calls made with our method to genotype calls made with the BIRDSEED, CHIAMO, GenCall, and ILLUMINUS genotype-calling methods, using genotype data from the Illumina 550K and Affymetrix 500K arrays. We show that our method has higher genotype-call accuracy and yields fewer uncalled genotypes than competing methods. We perform single-marker analysis of data from the Wellcome Trust Case Control Consortium bipolar disorder and type 2 diabetes Studies. I or bipolar disorder, the genotype calls in the original study yield 25 markers with apparent false-positive association with bipolar disorder at a p < 10(-7) significance level, whereas genotype calls made with our method yield no associated markers at this significance threshold. Conversely, for markers with replicated association with type 2 diabetes, there is good concordance between genotype calls used in the original study and calls made by our method. Results from single-marker and haplotypic analysis of our method's genotype calls for the bipolar disorder study indicate that our method is highly effective at eliminating genotyping artifacts that cause false-positive associations in genome-wide association Studies. Our new genotype-calling methods are implemented in the BEAGLE and BEAGLECALL software packages.
引用
收藏
页码:847 / 861
页数:15
相关论文
共 39 条
  • [1] Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease
    Barrett, Jeffrey C.
    Hansoul, Sarah
    Nicolae, Dan L.
    Cho, Judy H.
    Duerr, Richard H.
    Rioux, John D.
    Brant, Steven R.
    Silverberg, Mark S.
    Taylor, Kent D.
    Barmada, M. Michael
    Bitton, Alain
    Dassopoulos, Themistocles
    Datta, Lisa Wu
    Green, Todd
    Griffiths, Anne M.
    Kistner, Emily O.
    Murtha, Michael T.
    Regueiro, Miguel D.
    Rotter, Jerome I.
    Schumm, L. Philip
    Steinhart, A. Hillary
    Targan, Stephan R.
    Xavier, Ramnik J.
    Libioulle, Cecile
    Sandor, Cynthia
    Lathrop, Mark
    Belaiche, Jacques
    Dewit, Olivier
    Gut, Ivo
    Heath, Simon
    Laukens, Debby
    Mni, Myriam
    Rutgeerts, Paul
    Van Gossum, Andre
    Zelenika, Diana
    Franchimont, Denis
    Hugot, Jean-Pierre
    de Vos, Martine
    Vermeire, Severine
    Louis, Edouard
    Cardon, Lon R.
    Anderson, Carl A.
    Drummond, Hazel
    Nimmo, Elaine
    Ahmad, Tariq
    Prescott, Natalie J.
    Onnie, Clive M.
    Fisher, Sheila A.
    Marchini, Jonathan
    Ghori, Jilur
    [J]. NATURE GENETICS, 2008, 40 (08) : 955 - 962
  • [2] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [3] Haplotypic analysis of wellcome trust case control consortium data
    Browning, Brian L.
    Browning, Sharon R.
    [J]. HUMAN GENETICS, 2008, 123 (03) : 273 - 280
  • [4] Efficient multilocus association testing for whole genome association studies using localized haplotype clustering
    Browning, Brian L.
    Browning, Sharon R.
    [J]. GENETIC EPIDEMIOLOGY, 2007, 31 (05) : 365 - 375
  • [5] A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals
    Browning, Brian L.
    Browning, Sharon R.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 84 (02) : 210 - 223
  • [6] PRESTO: Rapid calculation of order statistic distributions and multiple-testing adjusted P-values via permutation for one and two-stage genetic association studies
    Browning, Brian L.
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [7] Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
    Browning, Sharon R.
    Browning, Brian L.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) : 1084 - 1097
  • [8] Missing data imputation and haplotype phase inference for genome-wide association studies
    Browning, Sharon R.
    [J]. HUMAN GENETICS, 2008, 124 (05) : 439 - 450
  • [9] Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
    Burton, Paul R.
    Clayton, David G.
    Cardon, Lon R.
    Craddock, Nick
    Deloukas, Panos
    Duncanson, Audrey
    Kwiatkowski, Dominic P.
    McCarthy, Mark I.
    Ouwehand, Willem H.
    Samani, Nilesh J.
    Todd, John A.
    Donnelly, Peter
    Barrett, Jeffrey C.
    Davison, Dan
    Easton, Doug
    Evans, David
    Leung, Hin-Tak
    Marchini, Jonathan L.
    Morris, Andrew P.
    Spencer, Chris C. A.
    Tobin, Martin D.
    Attwood, Antony P.
    Boorman, James P.
    Cant, Barbara
    Everson, Ursula
    Hussey, Judith M.
    Jolley, Jennifer D.
    Knight, Alexandra S.
    Koch, Kerstin
    Meech, Elizabeth
    Nutland, Sarah
    Prowse, Christopher V.
    Stevens, Helen E.
    Taylor, Niall C.
    Walters, Graham R.
    Walker, Neil M.
    Watkins, Nicholas A.
    Winzer, Thilo
    Jones, Richard W.
    McArdle, Wendy L.
    Ring, Susan M.
    Strachan, David P.
    Pembrey, Marcus
    Breen, Gerome
    St Clair, David
    Caesar, Sian
    Gordon-Smith, Katherine
    Jones, Lisa
    Fraser, Christine
    Green, Elain K.
    [J]. NATURE, 2007, 447 (7145) : 661 - 678
  • [10] Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data
    Carvalho, Benilton
    Bengtsson, Henrik
    Speed, Terence P.
    Irizarry, Rafael A.
    [J]. BIOSTATISTICS, 2007, 8 (02) : 485 - 499