Copy number variation signature to predict human ancestry

被引:6
作者
Pronold, Melissa [1 ,2 ,3 ]
Vali, Marzieh [1 ,2 ]
Pique-Regi, Roger [4 ]
Asgharzadeh, Shahab [1 ,2 ]
机构
[1] Univ So Calif, Childrens Hosp Los Angeles, Dept Pediat, Los Angeles, CA 90089 USA
[2] Univ So Calif, Keck Sch Med, Saban Res Inst, Los Angeles, CA 90033 USA
[3] Univ So Calif, Keck Sch Med, Dept Prevent Med, Los Angeles, CA 90033 USA
[4] Wayne State Univ, Sch Med, Dept Clin & Translat Sci, Detroit, MI USA
关键词
COMPARATIVE GENOMIC HYBRIDIZATION; HIDDEN-MARKOV MODEL; DETECT; CANCER; ABERRATIONS; ALGORITHM; VARIANTS; MAP;
D O I
10.1186/1471-2105-13-336
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datasets where common CNVs must be estimated by comparing the frequency of CNVs in the individual samples. Here we describe a simple and novel approach to locate genome-wide CNVs common to a specific population, using human ancestry as the phenotype. Results: We utilized our previously published Genome Alteration Detection Analysis (GADA) algorithm to identify common ancestry CNVs (caCNVs) and built a caCNV model to predict population structure. We identified a 73 caCNV signature using a training set of 225 healthy individuals from European, Asian, and African ancestry. The signature was validated on an independent test set of 300 individuals with similar ancestral background. The error rate in predicting ancestry in this test set was 2% using the 73 caCNV signature. Among the caCNVs identified, several were previously confirmed experimentally to vary by ancestry. Our signature also contains a caCNV region with a single microRNA (MIR270), which represents the first reported variation of microRNA by ancestry. Conclusions: We developed a new methodology to identify common CNVs and demonstrated its performance by building a caCNV signature to predict human ancestry with high accuracy. The utility of our approach could be extended to large case-control studies to identify CNV signatures for other phenotypes such as disease susceptibility and drug response.
引用
收藏
页数:10
相关论文
共 41 条
[1]  
Affymetrix Inc, 2009, GEN WID HUM SNP ARR
[2]   Fast model-based estimation of ancestry in unrelated individuals [J].
Alexander, David H. ;
Novembre, John ;
Lange, Kenneth .
GENOME RESEARCH, 2009, 19 (09) :1655-1664
[3]   Personalized copy number and segmental duplication maps using next-generation sequencing [J].
Alkan, Can ;
Kidd, Jeffrey M. ;
Marques-Bonet, Tomas ;
Aksay, Gozde ;
Antonacci, Francesca ;
Hormozdiari, Fereydoun ;
Kitzman, Jacob O. ;
Baker, Carl ;
Malig, Maika ;
Mutlu, Onur ;
Sahinalp, S. Cenk ;
Gibbs, Richard A. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, 41 (10) :1061-U29
[4]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[5]  
[Anonymous], 2011, R: A Language and Environment for Statistical Computing
[6]   Identification of Copy Number Variants Defining Genomic Differences among Major Human Groups [J].
Armengol, Lluis ;
Villatoro, Sergi ;
Gonzalez, Juan R. ;
Pantano, Lorena ;
Garcia-Aragones, Manel ;
Rabionet, Raquel ;
Caceres, Mario ;
Estivill, Xavier .
PLOS ONE, 2009, 4 (09)
[7]   Estimation and assessment of raw copy numbers at the single locus level [J].
Bengtsson, H. ;
Irizarry, R. ;
Carvalho, B. ;
Speed, T. P. .
BIOINFORMATICS, 2008, 24 (06) :759-767
[8]   Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma [J].
Beroukhim, Rameen ;
Getz, Gad ;
Nghiemphu, Leia ;
Barretina, Jordi ;
Hsueh, Teli ;
Linhart, David ;
Vivanco, Igor ;
Lee, Jeffrey C. ;
Huang, Julie H. ;
Alexander, Sethu ;
Du, Jinyan ;
Kau, Tweeny ;
Thomas, Roman K. ;
Shah, Kinial ;
Soto, Horacio ;
Perner, Sven ;
Prensner, John ;
Debiasi, Ralph M. ;
Demichelis, Francesca ;
Hatton, Charlie ;
Rubin, Mark A. ;
Garraway, Levi A. ;
Nelson, Stan F. ;
Liau, Linda ;
Mischel, Paul S. ;
Cloughesy, Tim F. ;
Meyerson, Matthew ;
Golub, Todd A. ;
Lander, Eric S. ;
Mellinghoff, Ingo K. ;
Sellers, William R. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (50) :20007-20012
[9]   Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model [J].
Broët, P ;
Richardson, S .
BIOINFORMATICS, 2006, 22 (08) :911-918
[10]   wuHMM: a robust algorithm to detect DNA copy number variation using long oligonucleotide microarray data [J].
Cahan, Patrick ;
Godfrey, Laura E. ;
Eis, Peggy S. ;
Richmond, Todd A. ;
Selzer, Rebecca R. ;
Brent, Michael ;
McLeod, Howard L. ;
Ley, Timothy J. ;
Graubert, Timothy A. .
NUCLEIC ACIDS RESEARCH, 2008, 36 (07)