Prediction of biogeographical ancestry from genotype: a comparison of classifiers

被引:22
作者
Cheung, Elaine Y. Y. [1 ]
Gahan, Michelle Elizabeth [1 ]
McNevin, Dennis [1 ]
机构
[1] Univ Canberra, Natl Ctr Forens Studies, Fac Educ Sci Technol & Math ESTeM, Bruce, ACT 2601, Australia
关键词
Biogeographical ancestry (BGA); Phenotype prediction; STRUCTURE; Bayesian; Genetic distance; Multinomial logistic regression; DETERMINING CONTINENTAL ORIGIN; GENOME-WIDE PATTERNS; POPULATION-STRUCTURE; INFORMATIVE MARKERS; DIVERSITY; ADMIXTURE; PANEL; ASSAY; AMERICANS; INFERENCE;
D O I
10.1007/s00414-016-1504-3
中图分类号
DF [法律]; D9 [法律]; R [医药、卫生];
学科分类号
0301 ; 10 ;
摘要
DNA can provide forensic intelligence regarding a donor's biogeographical ancestry (BGA) and other externally visible characteristics (EVCs). A number of algorithms have been proposed to assign individual human genotypes to a BGA using ancestry informative marker (AIM) panels. This study compares the BGA assignment accuracy of the population clustering program STRUCTURE and three generic classification approaches including a Bayesian algorithm, genetic distance, and multinomial logistic regression (MLR). A selection of 142 ancestry informative single nucleotide polymorphisms (SNPs) were chosen from existing marker panels (SNPforID 34-plex, Eurasiaplex, Seldin, and Kidd's AIM panels) to assess BGA classification at the continental level for Africans, Europeans, East Asians, and Amerindians. A training set of 1093 individuals with self-declared BGA from the 1000 Genomes phase 1 database was used by each classifier to predict BGA in a test set of 516 individuals from the HGDP-CEPH (Stanford) cell line panel. Tests were repeated with 0, 10, 50, 70, and 90% of the genotypes missing. Comparison of the area under the receiver operating characteristic curves (AUROCs) showed high accuracy in STRUCTURE and the generic Bayesian approach. The latter algorithm offers a computationally simpler alternative to STRUCTURE with little loss in accuracy and is suitable for phenotype prediction while STRUCTURE is not.
引用
收藏
页码:901 / 912
页数:12
相关论文
共 58 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[3]   SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access [J].
Amigo, Jorge ;
Salas, Antonio ;
Phillips, Christopher ;
Carracedo, Angel .
BMC BIOINFORMATICS, 2008, 9 (1)
[4]   Genome-wide patterns of population structure and admixture in West Africans and African Americans [J].
Bryc, Katarzyna ;
Auton, Adam ;
Nelson, Matthew R. ;
Oksenberg, Jorge R. ;
Hauser, Stephen L. ;
Williams, Scott ;
Froment, Alain ;
Bodo, Jean-Marie ;
Wambebe, Charles ;
Tishkoff, Sarah A. ;
Bustamante, Carlos D. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (02) :786-791
[5]  
Cann HM, 2002, SCIENCE, V296, P261
[6]   Strong Amerind/white sex bias and a possible sephardic contribution among the founders of a population in northwest Colombia [J].
Carvajal-Carmona, LG ;
Soto, ID ;
Pineda, N ;
Ortíz-Barrientos, D ;
Duque, C ;
Ospina-Duque, J ;
McCarthy, M ;
Montoya, P ;
Alvarez, VM ;
Bedoya, G ;
Ruiz-Linares, A .
AMERICAN JOURNAL OF HUMAN GENETICS, 2000, 67 (05) :1287-1295
[7]   Mexican American ancestry-informative markers: Examination of population structure and marker characteristics in European Americans, Mexican Americans, Amerindians and Asians [J].
Collins-Schramm, HE ;
Chima, B ;
Morii, T ;
Wah, K ;
Figueroa, Y ;
Criswell, LA ;
Hanson, RL ;
Knowler, WC ;
Silva, G ;
Belmont, JW ;
Seldin, MF .
HUMAN GENETICS, 2004, 114 (03) :263-271
[8]  
COX DR, 1958, J R STAT SOC B, V20, P215
[9]   The influence of ethnicity on warfarin dosage requirement [J].
Dang, MTN ;
Hambleton, J ;
Kayser, SR .
ANNALS OF PHARMACOTHERAPY, 2005, 39 (06) :1008-1012
[10]  
Deffenbacher K.A., 1980, LAW HUMAN BEHAV, V4, P243, DOI [DOI 10.1007/BF01040617, 10.1007/bf01040617, 10.1007/BF01040617]