Haplotype reconstruction from genotype data using Imperfect Phylogeny

被引:150
作者
Halperin, E [1 ]
Eskin, E
机构
[1] Univ Calif Berkeley, CS Div, Berkeley, CA 92093 USA
[2] Hebrew Univ Jerusalem, Sch Engn & Comp Sci, IL-91904 Jerusalem, Israel
关键词
D O I
10.1093/bioinformatics/bth149
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between different people, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes that shows that SNPs are organized in highly correlated 'blocks'. In a few recent studies, considerable parts of the human genome were partitioned into blocks, such that the majority of the sequenced genotypes have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks, and for each block, we predict the common haplotypes and each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (<2% over the data) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared with previous methods such as PHASE and HAPLOTYPER. Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large datasets.
引用
收藏
页码:1842 / 1849
页数:8
相关论文
共 24 条
  • [1] [Anonymous], [No title captured]
  • [2] BAFNA V, 2002, P 7 INT C COMP MOL B, P19
  • [3] BAFNA V, 2002, CSE200221 UCDAVIS, P1
  • [4] CLARK AG, 1990, MOL BIOL EVOL, V7, P111
  • [5] High-resolution haplotype structure in the human genome
    Daly, MJ
    Rioux, JD
    Schaffner, SE
    Hudson, TJ
    Lander, ES
    [J]. NATURE GENETICS, 2001, 29 (02) : 229 - 232
  • [6] Eskin Eleazar, 2003, J Bioinform Comput Biol, V1, P1, DOI 10.1142/S0219720003000174
  • [7] EXCOFFIER L, 1995, MOL BIOL EVOL, V12, P921
  • [8] Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data
    Fallin, D
    Schork, NJ
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2000, 67 (04) : 947 - 959
  • [9] The structure of haplotype blocks in the human genome
    Gabriel, SB
    Schaffner, SF
    Nguyen, H
    Moore, JM
    Roy, J
    Blumenstiel, B
    Higgins, J
    DeFelice, M
    Lochner, A
    Faggart, M
    Liu-Cordero, SN
    Rotimi, C
    Adeyemo, A
    Cooper, R
    Ward, R
    Lander, ES
    Daly, MJ
    Altshuler, D
    [J]. SCIENCE, 2002, 296 (5576) : 2225 - 2229
  • [10] Population genomics: Linkage disequilibrium holds the key
    Goldstein, DB
    Weale, ME
    [J]. CURRENT BIOLOGY, 2001, 11 (14) : R576 - R579