A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals

被引:1274
作者
Browning, Brian L. [1 ]
Browning, Sharon R. [1 ]
机构
[1] Univ Auckland, Dept Stat, Auckland 1142, New Zealand
基金
英国惠康基金; 美国国家卫生研究院;
关键词
GENOME-WIDE ASSOCIATION; LARGE-SCALE; SUSCEPTIBILITY LOCI; METAANALYSIS; MODELS; POWER; MAP;
D O I
10.1016/j.ajhg.2009.01.005
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
We present methods for imputing data for ungenotyped markers and for inferring haplotype phase in large data sets of unrelated individuals and parent-offspring trios. Our methods make use of known haplotype phase when it is available, and our methods are computationally efficient so that the full information in large reference panels with thousands of individuals is utilized. We demonstrate that substantial gains in imputation accuracy accrue with increasingly large reference panel sizes, particularly when imputing low-frequency A variants, and that unphased reference panels can provide highly accurate genotype imputation. We place our methodology in a unified framework that enables the simultaneous use of unphased and phased data from trios and unrelated individuals in a single analysis. For unrelated individuals, our imputation methods produce well-calibrated posterior genotype probabilities and highly accurate allele frequency estimates. For trios, our haplotype-inference method is four orders of magnitude faster than the gold-standard PHASE program 14 and has excellent accuracy. Our methods enable genotype imputation to be performed with unphased trio or unrelated reference panels, thus accounting for haplotype-phase uncertainty in the reference panel. We present a useful measure of imputation accuracy, allelic R-2, and show that this measure can be estimated accurately from posterior genotype probabilities. Our methods are implemented in version 3.0 of the BEAGLE software package.
引用
收藏
页码:210 / 223
页数:14
相关论文
共 29 条
  • [1] Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms
    Anderson, Carl A.
    Pettersson, Fredrik H.
    Barrett, Jeffrey C.
    Zhuang, Joanna J.
    Ragoussis, Jiannis
    Cardon, Lon R.
    Morris, Andrew P.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2008, 83 (01) : 112 - 119
  • [2] Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease
    Barrett, Jeffrey C.
    Hansoul, Sarah
    Nicolae, Dan L.
    Cho, Judy H.
    Duerr, Richard H.
    Rioux, John D.
    Brant, Steven R.
    Silverberg, Mark S.
    Taylor, Kent D.
    Barmada, M. Michael
    Bitton, Alain
    Dassopoulos, Themistocles
    Datta, Lisa Wu
    Green, Todd
    Griffiths, Anne M.
    Kistner, Emily O.
    Murtha, Michael T.
    Regueiro, Miguel D.
    Rotter, Jerome I.
    Schumm, L. Philip
    Steinhart, A. Hillary
    Targan, Stephan R.
    Xavier, Ramnik J.
    Libioulle, Cecile
    Sandor, Cynthia
    Lathrop, Mark
    Belaiche, Jacques
    Dewit, Olivier
    Gut, Ivo
    Heath, Simon
    Laukens, Debby
    Mni, Myriam
    Rutgeerts, Paul
    Van Gossum, Andre
    Zelenika, Diana
    Franchimont, Denis
    Hugot, Jean-Pierre
    de Vos, Martine
    Vermeire, Severine
    Louis, Edouard
    Cardon, Lon R.
    Anderson, Carl A.
    Drummond, Hazel
    Nimmo, Elaine
    Ahmad, Tariq
    Prescott, Natalie J.
    Onnie, Clive M.
    Fisher, Sheila A.
    Marchini, Jonathan
    Ghori, Jilur
    [J]. NATURE GENETICS, 2008, 40 (08) : 955 - 962
  • [3] Haplotypic analysis of wellcome trust case control consortium data
    Browning, Brian L.
    Browning, Sharon R.
    [J]. HUMAN GENETICS, 2008, 123 (03) : 273 - 280
  • [4] Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
    Browning, Sharon R.
    Browning, Brian L.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) : 1084 - 1097
  • [5] Multilocus association mapping using variable-length Markov chains
    Browning, Sharon R.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2006, 78 (06) : 903 - 913
  • [6] Missing data imputation and haplotype phase inference for genome-wide association studies
    Browning, Sharon R.
    [J]. HUMAN GENETICS, 2008, 124 (05) : 439 - 450
  • [7] Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
    Burton, Paul R.
    Clayton, David G.
    Cardon, Lon R.
    Craddock, Nick
    Deloukas, Panos
    Duncanson, Audrey
    Kwiatkowski, Dominic P.
    McCarthy, Mark I.
    Ouwehand, Willem H.
    Samani, Nilesh J.
    Todd, John A.
    Donnelly, Peter
    Barrett, Jeffrey C.
    Davison, Dan
    Easton, Doug
    Evans, David
    Leung, Hin-Tak
    Marchini, Jonathan L.
    Morris, Andrew P.
    Spencer, Chris C. A.
    Tobin, Martin D.
    Attwood, Antony P.
    Boorman, James P.
    Cant, Barbara
    Everson, Ursula
    Hussey, Judith M.
    Jolley, Jennifer D.
    Knight, Alexandra S.
    Koch, Kerstin
    Meech, Elizabeth
    Nutland, Sarah
    Prowse, Christopher V.
    Stevens, Helen E.
    Taylor, Niall C.
    Walters, Graham R.
    Walker, Neil M.
    Watkins, Nicholas A.
    Winzer, Thilo
    Jones, Richard W.
    McArdle, Wendy L.
    Ring, Susan M.
    Strachan, David P.
    Pembrey, Marcus
    Breen, Gerome
    St Clair, David
    Caesar, Sian
    Gordon-Smith, Katherine
    Jones, Lisa
    Fraser, Christine
    Green, Elain K.
    [J]. NATURE, 2007, 447 (7145) : 661 - 678
  • [8] Population structure, differential bias and genomic control in a large-scale, case-control association study
    Clayton, DG
    Walker, NM
    Smyth, DJ
    Pask, R
    Cooper, JD
    Maier, LM
    Smink, LJ
    Lam, AC
    Ovington, NR
    Stevens, HE
    Nutland, S
    Howson, JMM
    Faham, M
    Moorhead, M
    Jones, HB
    Falkowski, M
    Hardenbol, P
    Willis, TD
    Todd, JA
    [J]. NATURE GENETICS, 2005, 37 (11) : 1243 - 1246
  • [9] Practical aspects of imputation-driven meta-analysis of genome-wide association studies
    de Bakker, Paul I. W.
    Ferreira, Manuel A. R.
    Jia, Xiaoming
    Neale, Benjamin M.
    Raychaudhuri, Soumya
    Voight, Benjamin F.
    [J]. HUMAN MOLECULAR GENETICS, 2008, 17 : R122 - R128
  • [10] A second generation human haplotype map of over 3.1 million SNPs
    Frazer, Kelly A.
    Ballinger, Dennis G.
    Cox, David R.
    Hinds, David A.
    Stuve, Laura L.
    Gibbs, Richard A.
    Belmont, John W.
    Boudreau, Andrew
    Hardenbol, Paul
    Leal, Suzanne M.
    Pasternak, Shiran
    Wheeler, David A.
    Willis, Thomas D.
    Yu, Fuli
    Yang, Huanming
    Zeng, Changqing
    Gao, Yang
    Hu, Haoran
    Hu, Weitao
    Li, Chaohua
    Lin, Wei
    Liu, Siqi
    Pan, Hao
    Tang, Xiaoli
    Wang, Jian
    Wang, Wei
    Yu, Jun
    Zhang, Bo
    Zhang, Qingrun
    Zhao, Hongbin
    Zhao, Hui
    Zhou, Jun
    Gabriel, Stacey B.
    Barry, Rachel
    Blumenstiel, Brendan
    Camargo, Amy
    Defelice, Matthew
    Faggart, Maura
    Goyette, Mary
    Gupta, Supriya
    Moore, Jamie
    Nguyen, Huy
    Onofrio, Robert C.
    Parkin, Melissa
    Roy, Jessica
    Stahl, Erich
    Winchester, Ellen
    Ziaugra, Liuda
    Altshuler, David
    Shen, Yan
    [J]. NATURE, 2007, 449 (7164) : 851 - U3