Fast and accurate population admixture inference from genotype data from a few microsatellites to millions of SNPs

被引:17
作者
Wang, Jinliang [1 ]
机构
[1] Zool Soc London, Inst Zool, London NW1 4RY, England
关键词
BAYESIAN-ANALYSIS; GENETIC-STRUCTURE; ANCESTRY; DIFFERENTIATION; IDENTIFICATION; OPTIMIZATION;
D O I
10.1038/s41437-022-00535-z
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Model-based (likelihood and Bayesian) and non-model-based (PCA and K-means clustering) methods were developed to identify populations and assign individuals to the identified populations using marker genotype data. Model-based methods are favoured because they are based on a probabilistic model of population genetics with biologically meaningful parameters and thus produce results that are easily interpretable and applicable. Furthermore, they often yield more accurate structure inferences than non-model-based methods. However, current model-based methods either are computationally demanding and thus applicable to small problems only or use simplified admixture models that could yield inaccurate results in difficult situations such as unbalanced sampling. In this study, I propose new likelihood methods for fast and accurate population admixture inference using genotype data from a few multiallelic microsatellites to millions of diallelic SNPs. The methods conduct first a clustering analysis of coarse-grained population structure by using the mixture model and the simulated annealing algorithm, and then an admixture analysis of fine-grained population structure by using the clustering results as a starting point in an expectation maximisation algorithm. Extensive analyses of both simulated and empirical data show that the new methods compare favourably with existing methods in both accuracy and running speed. They can analyse small datasets with just a few multiallelic microsatellites but can also handle in parallel terabytes of data with millions of markers and millions of individuals. In difficult situations such as many and/or lowly differentiated populations, unbalanced or very small samples of individuals, the new methods are substantially more accurate than other methods.
引用
收藏
页码:79 / 92
页数:14
相关论文
共 17 条
  • [1] Inference of population structure and admixture proportion from Y chromosomal data of Chinese population
    Song, Mengyuan
    Wang, Xindi
    Zhao, Chenxi
    Qian, Xiaoqin
    Lang, Min
    Hou, Yiping
    Song, Feng
    ELECTROPHORESIS, 2022, 43 (23-24) : 2351 - 2362
  • [2] Inference of Population Structure from Time-Series Genotype Data
    Joseph, Tyler A.
    Pe'er, Itsik
    AMERICAN JOURNAL OF HUMAN GENETICS, 2019, 105 (02) : 317 - 333
  • [3] Biogeographical Ancestry Inference from Genotype: A Comparison of Ancestral Informative SNPs and Genome-wide SNPs
    Qu, Yue
    Tran, Dat
    Martinez-Marroquin, Elisa
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 64 - 70
  • [4] TRES: Identification of Discriminatory and Informative SNPs from Population Genomic Data
    Kavakiotis, Ioannis
    Triantafyllidis, Alexandros
    Ntelidou, Despoina
    Alexandri, Panoraia
    Megens, Hendrik-Jan
    Crooijmans, Richard P. M. A.
    Groenen, Martien A. M.
    Tsoumakas, Grigorios
    Vlahavas, Ioannis
    JOURNAL OF HEREDITY, 2015, 106 (05) : 672 - 676
  • [5] Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
    Bansal, Vikas
    Libiger, Ondrej
    BMC BIOINFORMATICS, 2015, 16
  • [6] Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data
    Pique-Regi, Roger
    Degner, Jacob F.
    Pai, Athma A.
    Gaffney, Daniel J.
    Gilad, Yoav
    Pritchard, Jonathan K.
    GENOME RESEARCH, 2011, 21 (03) : 447 - 455
  • [7] Inference of the statistics of a modulated promoter process from population snapshot gene expression data
    Cinquemani, Eugenio
    IFAC PAPERSONLINE, 2020, 53 (02): : 16767 - 16772
  • [8] Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data
    Pickrell, Joseph K.
    Pritchard, Jonathan K.
    PLOS GENETICS, 2012, 8 (11):
  • [9] Fast and Accurate Retrieval of Methane Concentration From Imaging Spectrometer Data Using Sparsity Prior
    Foote, Markus D.
    Dennison, Philip E.
    Thorpe, Andrew K.
    Thompson, David R.
    Jongaramrungruang, Siraput
    Frankenberg, Christian
    Joshi, Sarang C.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (09): : 6480 - 6492
  • [10] Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data
    Iliadis, Alexandros
    Anastassiou, Dimitris
    Wang, Xiaodong
    BMC GENETICS, 2012, 13