The Use of Family Relationships and Linkage Disequilibrium to Impute Phase and Missing Genotypes in Up to Whole-Genome Sequence Density Genotypic Data

被引:63
作者
Meuwissen, Theo [1 ]
Goddard, Mike [2 ,3 ]
机构
[1] Norwegian Univ Life Sci, N-1430 As, Norway
[2] Univ Melbourne, Melbourne, Vic, Australia
[3] Dept Primary Ind, Melbourne, Vic 3010, Australia
关键词
EFFICIENT ALGORITHM; POPULATION; DESCENT; MODEL; PEDIGREE; PROBABILITIES; PREDICTION; IDENTITY; LOCI;
D O I
10.1534/genetics.110.113936
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
A novel method, called linkage disequilibrium multilocus iterative peeling (LDMIP), for the imputation of phase and missing genotypes is developed. LDMIP performs an iterative peeling step for every locus, which accounts for the family data, and uses a forward-backward algorithm to accumulate information across loci. Marker similarity between haplotype pairs is used to impute possible missing genotypes and phases, which relies on the linkage disequilibrium between closely linked markers. After this imputation step, the combined iterative peeling/forward-backward algorithm is applied again, until convergence. The calculations per iteration scale linearly with number of markers and number of individuals in the pedigree, which makes LDMIP well suited to large numbers of markers and/or large numbers of individuals. Per iteration calculations scale quadratically with the number of alleles, which implies biallelic markers are preferred. In a situation with up to 15% randomly missing genotypes, the error rate of the imputed genotypes was <1% and similar to 99% of the missing genotypes were imputed. In another example, LDMIP was used to impute whole-genome sequence data consisting of 17,321 SNPs on a chromosome. Imputation of the sequence was based on the information of 20 (re) sequenced founder individuals and genotyping their descendants for a panel of 3000 SNPs. The error rate of the imputed SNP genotypes was 10%. However, if the parents of these 20 founders are also sequenced, >99% of missing genotypes are imputed correctly.
引用
收藏
页码:1441 / U450
页数:11
相关论文
共 19 条
  • [1] GENERAL MODEL FOR GENETIC ANALYSIS OF PEDIGREE DATA
    ELSTON, RC
    STEWART, J
    [J]. HUMAN HEREDITY, 1971, 21 (06) : 523 - &
  • [2] Falconer D. S., 1996, Introduction to quantitative genetics.
  • [3] AN EFFICIENT ALGORITHM TO COMPUTE THE POSTERIOR GENOTYPIC DISTRIBUTION FOR EVERY MEMBER OF A PEDIGREE WITHOUT LOOPS
    FERNANDO, RL
    STRICKER, C
    ELSTON, RC
    [J]. THEORETICAL AND APPLIED GENETICS, 1993, 87 (1-2) : 89 - 93
  • [4] Generating samples under a Wright-Fisher neutral model of genetic variation
    Hudson, RR
    [J]. BIOINFORMATICS, 2002, 18 (02) : 337 - 338
  • [5] Computing approximate monogenic model likelihoods in large pedigrees with loops
    Janss, LLG
    VanArendonk, JAM
    VanderWerf, JHJ
    [J]. GENETICS SELECTION EVOLUTION, 1995, 27 (06) : 567 - 579
  • [6] An efficient algorithm for segregation analysis in large populations
    Kerr, RJ
    Kinghorn, BP
    [J]. JOURNAL OF ANIMAL BREEDING AND GENETICS-ZEITSCHRIFT FUR TIERZUCHTUNG UND ZUCHTUNGSBIOLOGIE, 1996, 113 (06): : 457 - 469
  • [7] KIMURA M, 1969, GENETICS, V61, P893
  • [8] Detection of sharing by descent, long-range phasing and haplotype imputation
    Kong, Augustine
    Masson, Gisli
    Frigge, Michael L.
    Gylfason, Arnaldur
    Zusmanovich, Pasha
    Thorleifsson, Gudmar
    Olason, Pall I.
    Ingason, Andres
    Steinberg, Stacy
    Rafnar, Thorunn
    Sulem, Patrick
    Mouy, Magali
    Jonsson, Frosti
    Thorsteinsdottir, Unnur
    Gudbjartsson, Daniel F.
    Stefansson, Hreinn
    Stefansson, Kari
    [J]. NATURE GENETICS, 2008, 40 (09) : 1068 - 1075
  • [9] CONSTRUCTION OF MULTILOCUS GENETIC-LINKAGE MAPS IN HUMANS
    LANDER, ES
    GREEN, P
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (08) : 2363 - 2367
  • [10] Meuwissen THE, 2001, GENET SEL EVOL, V33, P605, DOI 10.1051/gse:2001134