Testing SNPs and sets of SNPs for importance in association studies

被引:30
作者
Schwender, Holger [1 ]
Ruczinski, Ingo [1 ]
Ickstadt, Katja [2 ]
机构
[1] Johns Hopkins Univ, Dept Biostat, Baltimore, MD 21205 USA
[2] TU Dortmund Univ, Dept Stat, D-44221 Dortmund, Germany
基金
美国国家卫生研究院;
关键词
Feature selection; GENICA; Importance measure; logicFS; Logic regression; GENOME-WIDE ASSOCIATION; HIGH-ORDER INTERACTIONS; GENE-EXPRESSION DATA; CANCER RISK; ENRICHMENT ANALYSIS; IDENTIFICATION; MICROARRAY; REGRESSION; ALGORITHM;
D O I
10.1093/biostatistics/kxq042
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A major goal of genetic association studies concerned with single nucleotide polymorphisms (SNPs) is the detection of SNPs exhibiting an impact on the risk of developing a disease. Typically, this problem is approached by testing each of the SNPs individually. This, however, can lead to an inaccurate measurement of the influence of the SNPs on the disease risk, in particular, if SNPs only show an effect when interacting with other SNPs, as the multivariate structure of the data is ignored. In this article, we propose a testing procedure based on logic regression that takes this structure into account and therefore enables a more appropriate quantification of importance and ranking of the SNPs than marginal testing. Since even SNP interactions often exhibit only a moderate effect on the disease risk, it can be helpful to also consider sets of SNPs (e.g. SNPs belonging to the same gene or pathway) to borrow strength across these SNP sets and to identify those genes or pathways comprising SNPs that are most consistently associated with the response. We show how the proposed procedure can be adapted for testing SNP sets, and how it can be applied to blocks of SNPs in linkage disequilibrium (LD) to overcome problems caused by LD.
引用
收藏
页码:18 / 32
页数:15
相关论文
共 47 条
[1]   Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[5]   Identifying SNPs predictive of phenotype using random forests [J].
Bureau, A ;
Dupuis, J ;
Falls, K ;
Lunetta, KL ;
Hayward, B ;
Keith, TP ;
Van Eerdewegh, P .
GENETIC EPIDEMIOLOGY, 2005, 28 (02) :171-182
[6]   GLOSSI: a method to assess the association of genetic loci-sets with complex diseases [J].
Chai, High-Seng ;
Sicotte, Hugues ;
Bailey, Kent R. ;
Turner, Stephen T. ;
Asmann, Yan W. ;
Kocher, Jean-Pierre A. .
BMC BIOINFORMATICS, 2009, 10
[7]   Analysis of multiple SNPs in a candidate gene or region [J].
Chapman, Juliet ;
Whittaker, John .
GENETIC EPIDEMIOLOGY, 2008, 32 (06) :560-566
[8]   On the Utility of Gene Set Methods in Genomewide Association Studies of Quantitative Traits [J].
Chasman, Daniel I. .
GENETIC EPIDEMIOLOGY, 2008, 32 (07) :658-668
[9]   An evolutionary algorithm to find associations in dense genetic maps [J].
Clark, Taane G. ;
De Iorio, Maria ;
Griffths, Robert C. .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2008, 12 (03) :297-306
[10]   Bayesian logistic regression using a perfect phylogeny [J].
Clark, Taane G. ;
De Iorio, Maria ;
Griffiths, Robert C. .
BIOSTATISTICS, 2007, 8 (01) :32-52