Prediction of biogeographical ancestry from genotype: a comparison of classifiers

被引:20
作者
Cheung, Elaine Y. Y. [1 ]
Gahan, Michelle Elizabeth [1 ]
McNevin, Dennis [1 ]
机构
[1] Univ Canberra, Natl Ctr Forens Studies, Fac Educ Sci Technol & Math ESTeM, Bruce, ACT 2601, Australia
关键词
Biogeographical ancestry (BGA); Phenotype prediction; STRUCTURE; Bayesian; Genetic distance; Multinomial logistic regression; DETERMINING CONTINENTAL ORIGIN; GENOME-WIDE PATTERNS; POPULATION-STRUCTURE; INFORMATIVE MARKERS; DIVERSITY; ADMIXTURE; PANEL; ASSAY; AMERICANS; INFERENCE;
D O I
10.1007/s00414-016-1504-3
中图分类号
DF [法律]; D9 [法律]; R [医药、卫生];
学科分类号
0301 ; 10 ;
摘要
DNA can provide forensic intelligence regarding a donor's biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd's AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.
引用
收藏
页码:901 / 912
页数:12
相关论文
共 58 条
  • [1] A global reference for human genetic variation
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Wang, Jun
    Wilson, Richard K.
    Boerwinkle, Eric
    Doddapaneni, Harsha
    Han, Yi
    Korchina, Viktoriya
    Kovar, Christie
    Lee, Sandra
    Muzny, Donna
    Reid, Jeffrey G.
    Zhu, Yiming
    Chang, Yuqi
    Feng, Qiang
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Lan, Tianming
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Liu, Shengmao
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Tang, Meifang
    Wang, Bo
    [J]. NATURE, 2015, 526 (7571) : 68 - +
  • [2] An integrated map of genetic variation from 1,092 human genomes
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Schmidt, Jeanette P.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Dinh, Huyen
    Kovar, Christie
    Lee, Sandra
    Lewis, Lora
    Muzny, Donna
    Reid, Jeff
    Wang, Min
    Wang, Jun
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Li, Zhuo
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Su, Zhe
    Tai, Shuaishuai
    Tang, Meifang
    [J]. NATURE, 2012, 491 (7422) : 56 - 65
  • [3] SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
    Amigo, Jorge
    Salas, Antonio
    Phillips, Christopher
    Carracedo, Angel
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [4] Genome-wide patterns of population structure and admixture in West Africans and African Americans
    Bryc, Katarzyna
    Auton, Adam
    Nelson, Matthew R.
    Oksenberg, Jorge R.
    Hauser, Stephen L.
    Williams, Scott
    Froment, Alain
    Bodo, Jean-Marie
    Wambebe, Charles
    Tishkoff, Sarah A.
    Bustamante, Carlos D.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (02) : 786 - 791
  • [5] Cann HM, 2002, SCIENCE, V296, P261
  • [6] Strong Amerind/white sex bias and a possible sephardic contribution among the founders of a population in northwest Colombia
    Carvajal-Carmona, LG
    Soto, ID
    Pineda, N
    Ortíz-Barrientos, D
    Duque, C
    Ospina-Duque, J
    McCarthy, M
    Montoya, P
    Alvarez, VM
    Bedoya, G
    Ruiz-Linares, A
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2000, 67 (05) : 1287 - 1295
  • [7] Mexican American ancestry-informative markers: Examination of population structure and marker characteristics in European Americans, Mexican Americans, Amerindians and Asians
    Collins-Schramm, HE
    Chima, B
    Morii, T
    Wah, K
    Figueroa, Y
    Criswell, LA
    Hanson, RL
    Knowler, WC
    Silva, G
    Belmont, JW
    Seldin, MF
    [J]. HUMAN GENETICS, 2004, 114 (03) : 263 - 271
  • [8] COX DR, 1958, J R STAT SOC B, V20, P215
  • [9] The influence of ethnicity on warfarin dosage requirement
    Dang, MTN
    Hambleton, J
    Kayser, SR
    [J]. ANNALS OF PHARMACOTHERAPY, 2005, 39 (06) : 1008 - 1012
  • [10] Deffenbacher K.A., 1980, LAW HUMAN BEHAV, V4, P243, DOI [DOI 10.1007/BF01040617, 10.1007/bf01040617, 10.1007/BF01040617]