Fast and accurate population admixture inference from genotype data from a few microsatellites to millions of SNPs

被引:17
|
作者
Wang, Jinliang [1 ]
机构
[1] Zool Soc London, Inst Zool, London NW1 4RY, England
关键词
BAYESIAN-ANALYSIS; GENETIC-STRUCTURE; ANCESTRY; DIFFERENTIATION; IDENTIFICATION; OPTIMIZATION;
D O I
10.1038/s41437-022-00535-z
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Model-based (likelihood and Bayesian) and non-model-based (PCA and K-means clustering) methods were developed to identify populations and assign individuals to the identified populations using marker genotype data. Model-based methods are favoured because they are based on a probabilistic model of population genetics with biologically meaningful parameters and thus produce results that are easily interpretable and applicable. Furthermore, they often yield more accurate structure inferences than non-model-based methods. However, current model-based methods either are computationally demanding and thus applicable to small problems only or use simplified admixture models that could yield inaccurate results in difficult situations such as unbalanced sampling. In this study, I propose new likelihood methods for fast and accurate population admixture inference using genotype data from a few multiallelic microsatellites to millions of diallelic SNPs. The methods conduct first a clustering analysis of coarse-grained population structure by using the mixture model and the simulated annealing algorithm, and then an admixture analysis of fine-grained population structure by using the clustering results as a starting point in an expectation maximisation algorithm. Extensive analyses of both simulated and empirical data show that the new methods compare favourably with existing methods in both accuracy and running speed. They can analyse small datasets with just a few multiallelic microsatellites but can also handle in parallel terabytes of data with millions of markers and millions of individuals. In difficult situations such as many and/or lowly differentiated populations, unbalanced or very small samples of individuals, the new methods are substantially more accurate than other methods.
引用
收藏
页码:79 / 92
页数:14
相关论文
共 50 条
  • [1] Fast and accurate population admixture inference from genotype data from a few microsatellites to millions of SNPs
    Jinliang Wang
    Heredity, 2022, 129 : 79 - 92
  • [2] Gaussianization for fast and accurate inference from cosmological data
    Schuhmann, Robert L.
    Joachimi, Benjamin
    Peiris, Hiranya V.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2016, 459 (02) : 1916 - 1928
  • [3] Inference of recent admixture history and parental admixture proportions from genotype and low depth sequencing data
    Garcia-Erill, Genis
    Hanghoj, Kristian
    Heller, Rasmus
    Albrechtsen, Anders
    HUMAN HEREDITY, 2022, VOL. (SUPPL 1) : 30 - 31
  • [4] Fast and accurate estimation of the population-scaled mutation rate, θ, from microsatellite genotype data
    RoyChoudhury, Arindam
    Stephens, Matthew
    GENETICS, 2007, 176 (02) : 1363 - 1366
  • [5] Inference of population structure and admixture proportion from Y chromosomal data of Chinese population
    Song, Mengyuan
    Wang, Xindi
    Zhao, Chenxi
    Qian, Xiaoqin
    Lang, Min
    Hou, Yiping
    Song, Feng
    ELECTROPHORESIS, 2022, 43 (23-24) : 2351 - 2362
  • [6] A fast haplotype inference method for large population genotype data
    Zhang, Ji-Hong
    Wu, Ling-Yun
    Chen, Jian
    Zhang, Xiang-Sun
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (11) : 4891 - 4902
  • [7] ABC inference of multi-population divergence with admixture from unphased population genomic data
    Robinson, John D.
    Bunnefeld, Lynsey
    Hearn, Jack
    Stone, Graham N.
    Hickerson, Michael J.
    MOLECULAR ECOLOGY, 2014, 23 (18) : 4458 - 4471
  • [8] Inference of Population Structure from Time-Series Genotype Data
    Joseph, Tyler A.
    Pe'er, Itsik
    AMERICAN JOURNAL OF HUMAN GENETICS, 2019, 105 (02) : 317 - 333
  • [9] An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data
    Wang, Yi
    Lu, James
    Yu, Jin
    Gibbs, Richard A.
    Yu, Fuli
    GENOME RESEARCH, 2013, 23 (05) : 833 - 842
  • [10] Biogeographical Ancestry Inference from Genotype: A Comparison of Ancestral Informative SNPs and Genome-wide SNPs
    Qu, Yue
    Tran, Dat
    Martinez-Marroquin, Elisa
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 64 - 70