An eigenvalue ratio approach to inferring population structure from whole genome sequencing data

被引:1
作者
Xu, Yuyang [1 ]
Liu, Zhonghua [1 ]
Yao, Jianfeng [2 ]
机构
[1] Univ Hong Kong, Dept Stat & Actuarial Sci, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong Shenzhen, Sch Data Sci, Shenzhen, Peoples R China
关键词
population structure; principal component; random matrix theory; sequencing data; spectral analysis; SAMPLE COVARIANCE MATRICES; PRINCIPAL-COMPONENTS; STRATIFICATION; NUMBER; RARE; ASSOCIATION; COMMON; SCALE; LIMIT; LAW;
D O I
10.1111/biom.13691
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Inference of population structure from genetic data plays an important role in population and medical genetics studies. With the advancement and decreasing cost of sequencing technology, the increasingly available whole genome sequencing data provide much richer information about the underlying population structure. The traditional method originally developed for array-based genotype data for computing and selecting top principal components (PCs) that capture population structure may not perform well on sequencing data for two reasons. First, the number of genetic variants p is much larger than the sample size n in sequencing data such that the sample-to-marker ratio n/p$n/p$ is nearly zero, violating the assumption of the Tracy-Widom test used in their method. Second, their method might not be able to handle the linkage disequilibrium well in sequencing data. To resolve those two practical issues, we propose a new method called ERStruct to determine the number of top informative PCs based on sequencing data. More specifically, we propose to use the ratio of consecutive eigenvalues as a more robust test statistic, and then we approximate its null distribution using modern random matrix theory. Both simulation studies and applications to two public data sets from the HapMap 3 and the 1000 Genomes Projects demonstrate the empirical performance of our ERStruct method.
引用
收藏
页码:891 / 902
页数:12
相关论文
共 33 条
[1]   Eigenvalue Ratio Test for the Number of Factors [J].
Ahn, Seung C. ;
Horenstein, Alex R. .
ECONOMETRICA, 2013, 81 (03) :1203-1227
[2]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[3]   Integrating common and rare genetic variation in diverse human populations [J].
Altshuler, David M. ;
Gibbs, Richard A. ;
Peltonen, Leena ;
Dermitzakis, Emmanouil ;
Schaffner, Stephen F. ;
Yu, Fuli ;
Bonnen, Penelope E. ;
de Bakker, Paul I. W. ;
Deloukas, Panos ;
Gabriel, Stacey B. ;
Gwilliam, Rhian ;
Hunt, Sarah ;
Inouye, Michael ;
Jia, Xiaoming ;
Palotie, Aarno ;
Parkin, Melissa ;
Whittaker, Pamela ;
Chang, Kyle ;
Hawes, Alicia ;
Lewis, Lora R. ;
Ren, Yanru ;
Wheeler, David ;
Muzny, Donna Marie ;
Barnes, Chris ;
Darvishi, Katayoon ;
Hurles, Matthew ;
Korn, Joshua M. ;
Kristiansson, Kati ;
Lee, Charles ;
McCarroll, Steven A. ;
Nemesh, James ;
Keinan, Alon ;
Montgomery, Stephen B. ;
Pollack, Samuela ;
Price, Alkes L. ;
Soranzo, Nicole ;
Gonzaga-Jauregui, Claudia ;
Anttila, Verneri ;
Brodeur, Wendy ;
Daly, Mark J. ;
Leslie, Stephen ;
McVean, Gil ;
Moutsianas, Loukas ;
Nguyen, Huy ;
Zhang, Qingrun ;
Ghori, Mohammed J. R. ;
McGinnis, Ralph ;
McLaren, William ;
Takeuchi, Fumihiko ;
Grossman, Sharon R. .
NATURE, 2010, 467 (7311) :52-58
[4]   A Random Matrix Theory Approach to Denoise Single-Cell Data [J].
Aparicio, Luis ;
Bordyuh, Mykola ;
Blumberg, Andrew J. ;
Rabadan, Raul .
PATTERNS, 2020, 1 (03)
[5]   WIGNERS SEMICIRCLE LAW FOR EIGENVALUES OF RANDOM MATRICES [J].
ARNOLD, L .
ZEITSCHRIFT FUR WAHRSCHEINLICHKEITSTHEORIE UND VERWANDTE GEBIETE, 1971, 19 (03) :191-&
[6]   Central limit theorems for eigenvalues in a spiked population model [J].
Bai, Zhidong ;
Yao, Jian-Feng .
ANNALES DE L INSTITUT HENRI POINCARE-PROBABILITES ET STATISTIQUES, 2008, 44 (03) :447-474
[7]   Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices [J].
Baik, J ;
Ben Arous, G ;
Péché, S .
ANNALS OF PROBABILITY, 2005, 33 (05) :1643-1697
[8]   Eigenvalues of large sample covariance matrices of spiked population models [J].
Baik, Jinho ;
Silverstein, Jack W. .
JOURNAL OF MULTIVARIATE ANALYSIS, 2006, 97 (06) :1382-1408
[9]   Fluctuations of the extreme eigenvalues of finite rank deformations of random matrices [J].
Benaych-Georges, F. ;
Guionnet, A. ;
Maida, M. .
ELECTRONIC JOURNAL OF PROBABILITY, 2011, 16 :1621-1662
[10]   The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices [J].
Benaych-Georges, Florent ;
Nadakuditi, Raj Rao .
ADVANCES IN MATHEMATICS, 2011, 227 (01) :494-521