Clines, clusters, and the effect of study design on the inference of human population structure

被引:405
作者
Rosenberg, NA [1 ]
Mahajan, S
Ramachandran, S
Zhao, CF
Pritchard, JK
Feldman, MW
机构
[1] Univ Michigan, Dept Human Genet, Bioinformat Program, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Inst Life Sci, Ann Arbor, MI USA
[3] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
[4] Stanford Univ, Dept Biol Sci, Stanford, CA 94305 USA
[5] Marshfield Clin Res Fdn, Mammalian Genotyping Serv, Marshfield, WI USA
[6] Univ Chicago, Dept Human Genet, Chicago, IL USA
来源
PLOS GENETICS | 2005年 / 1卷 / 06期
关键词
D O I
10.1371/journal.pgen.0010070
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Previously, we observed that without using prior information about individual sampling locations, a clustering algorithm applied to multilocus genotypes from worldwide human populations produced genetic clusters largely coincident with major geographic regions. It has been argued, however, that the degree of clustering is diminished by use of samples with greater uniformity in geographic distribution, and that the clusters we identified were a consequence of uneven sampling along genetic clines. Expanding our earlier clataset from 377 to 993 markers, we systematically examine the influence of several study design variables-sample size, number of loci, number of clusters, assumptions about correlations in allele frequencies across populations, and the geographic dispersion of the sample-on the "clusteredness" of individuals. With all other variables held constant, geographic dispersion is seen to have comparatively little effect on the degree of clustering. Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus clataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions.
引用
收藏
页码:660 / 671
页数:12
相关论文
共 23 条
  • [1] [Anonymous], 1987, Statistical Analysis of Spherical Data
  • [2] Human population genetic structure and inference of group membership
    Bamshad, MJ
    Wooding, S
    Watkins, WS
    Ostler, CT
    Batzer, MA
    Jorde, LB
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 72 (03) : 578 - 589
  • [3] HIGH-RESOLUTION OF HUMAN EVOLUTIONARY TREES WITH POLYMORPHIC MICROSATELLITES
    BOWCOCK, AM
    RUIZLINARES, A
    TOMFOHRDE, J
    MINCH, E
    KIDD, JR
    CAVALLISFORZA, LL
    [J]. NATURE, 1994, 368 (6470) : 455 - 457
  • [4] Cann HM, 2002, SCIENCE, V296, P261
  • [5] Cavalli-Sforza L. L., 1994, HIST GEOGRAPHY HUMAN
  • [6] CHEBRANIOUS N, 2003, BMC GENOMICS, V4, P6
  • [7] BAPS 2:: enhanced possibilities for the analysis of genetic population structure
    Corander, J
    Waldmann, P
    Marttinen, P
    Sillanpää, MJ
    [J]. BIOINFORMATICS, 2004, 20 (15) : 2363 - 2369
  • [8] Falush D, 2003, GENETICS, V164, P1567
  • [9] CHOOSING A POINT FROM SURFACE OF A SPHERE
    MARSAGLIA, G
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1972, 43 (02): : 645 - +
  • [10] Multilocus genotypes, a tree of individuals, and human evolutionary history
    Mountain, JL
    CavalliSforza, LL
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 1997, 61 (03) : 705 - 718