Determination of nonlinear genetic architecture using compressed sensing

被引:4
作者
Ho, Chiu Man [1 ]
Hsu, Stephen D. H. [1 ]
机构
[1] Michigan State Univ, Dept Phys & Astron, E Lansing, MI 48824 USA
关键词
Genomics; Compressed sensing; Nonlinear interactions; SELECTION; NEIGHBORLINESS; POLYTOPES; MODELS;
D O I
10.1186/s13742-015-0081-6
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: One of the fundamental problems of modern genomics is to extract the genetic architecture of a complex trait from a data set of individual genotypes and trait values. Establishing this important connection between genotype and phenotype is complicated by the large number of candidate genes, the potentially large number of causal loci, and the likely presence of some nonlinear interactions between different genes. Compressed Sensing methods obtain solutions to under-constrained systems of linear equations. These methods can be applied to the problem of determining the best model relating genotype to phenotype, and generally deliver better performance than simply regressing the phenotype against each genetic variant, one at a time. We introduce a Compressed Sensing method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. Our method uses L1-penalized regression applied to nonlinear functions of the sensing matrix. Results: The computational and data resource requirements for our method are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using simulated human genomes and the small amount of currently available real data. A phase transition (i.e., dramatic and qualitative change) in the behavior of the algorithm indicates when sufficient data is available for its successful application. Conclusion: Our results indicate that predictive models for many complex traits, including a variety of human disease susceptibilities (e.g., with additive heritability h(2) similar to 0.5), can be extracted from data sets comprised of n(star) similar to 100s individuals, where s is the number of distinct causal variants influencing the trait. For example, given a trait controlled by similar to 10 k loci, roughly a million individuals would be sufficient for application of the method.
引用
收藏
页数:13
相关论文
共 30 条
[1]  
[Anonymous], 2013, MATH INTRO COMPRESSI, DOI DOI 10.1007/978-0-8176-4948-7
[2]  
[Anonymous], 2006, P INT C MATH
[3]   Analysis of multilocus models of association [J].
Devlin, B ;
Roeder, K ;
Wasserman, L .
GENETIC EPIDEMIOLOGY, 2003, 25 (01) :36-47
[4]   Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing [J].
Donoho, David ;
Tanner, Jared .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2009, 367 (1906) :4273-4293
[5]  
Donoho DL, 2009, J AM MATH SOC, V22, P1
[6]   High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension [J].
Donoho, DL .
DISCRETE & COMPUTATIONAL GEOMETRY, 2006, 35 (04) :617-652
[7]   Compressed sensing [J].
Donoho, DL .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (04) :1289-1306
[8]   Neighborliness of randomly projected simplices in high dimensions [J].
Donoho, DL ;
Tanner, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (27) :9452-9457
[9]   Sparse nonnegative solution of underdetermined linear equations by linear programming [J].
Donoho, DL ;
Tanner, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (27) :9446-9451
[10]  
Elad M, 2010, SPARSE AND REDUNDANT REPRESENTATIONS, P3, DOI 10.1007/978-1-4419-7011-4_1