Clumpak: a program for identifying clustering modes and packaging population structure inferences across K

被引:2426
作者
Kopelman, Naama M. [1 ]
Mayzel, Jonathan [1 ]
Jakobsson, Mattias [2 ,3 ]
Rosenberg, Noah A. [4 ]
Mayrose, Itay [1 ]
机构
[1] Tel Aviv Univ, Dept Mol Biol & Ecol Plants, IL-69978 Ramat Aviv, Israel
[2] Uppsala Univ, Dept Evolutionary Biol, S-75236 Uppsala, Sweden
[3] Uppsala Univ, SciLife Lab, S-75236 Uppsala, Sweden
[4] Stanford Univ, Dept Biol, Stanford, CA 94305 USA
基金
以色列科学基金会;
关键词
admixture; ancestry; clustering; population structure; MULTILOCUS GENOTYPE DATA; GENETIC DIFFERENTIATION; ANCESTRY; ASSOCIATION; ASSIGNMENT; ALGORITHM; ADMIXTURE; SOFTWARE; EASTERN; NUMBER;
D O I
10.1111/1755-0998.12387
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at , simplifies the use of model-based analyses of population structure in population genetics and molecular ecology.
引用
收藏
页码:1179 / 1191
页数:13
相关论文
共 53 条
  • [1] Enhancements to the ADMIXTURE algorithm for individual ancestry estimation
    Alexander, David H.
    Lange, Kenneth
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [2] Fast model-based estimation of ancestry in unrelated individuals
    Alexander, David H.
    Novembre, John
    Lange, Kenneth
    [J]. GENOME RESEARCH, 2009, 19 (09) : 1655 - 1664
  • [3] [Anonymous], 2000, A cluster algorithm for graphs, DOI DOI 10.1016/J.COSREV.2007.05.001
  • [4] [Anonymous], 1988, Algorithms for Clustering Data
  • [5] Association mapping of kernel size and milling quality in wheat (Triticum aestivum L.) cultivars
    Breseghello, F
    Sorrells, ME
    [J]. GENETICS, 2006, 172 (02) : 1165 - 1177
  • [6] Bayesian clustering algorithms ascertaining spatial population structure:: a new computer program and a comparison study
    Chen, Chibiao
    Durand, Eric
    Forbes, Florence
    Francois, Olivier
    [J]. MOLECULAR ECOLOGY NOTES, 2007, 7 (05): : 747 - 756
  • [7] BAPS 2:: enhanced possibilities for the analysis of genetic population structure
    Corander, J
    Waldmann, P
    Marttinen, P
    Sillanpää, MJ
    [J]. BIOINFORMATICS, 2004, 20 (15) : 2363 - 2369
  • [8] Corander J, 2003, GENETICS, V163, P367
  • [9] Bayesian spatial modeling of genetic population structure
    Corander, Jukka
    Siren, Jukka
    Arjas, Elja
    [J]. COMPUTATIONAL STATISTICS, 2008, 23 (01) : 111 - 129
  • [10] Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations
    Corander, Jukka
    Marttinen, Pekka
    Siren, Jukka
    Tang, Jing
    [J]. BMC BIOINFORMATICS, 2008, 9 (1) : 539