Identification of SNP interactions using logic regression

被引:100
作者
Schwender, Holger [1 ]
Ickstadt, Katja [1 ]
机构
[1] Univ Dortmund, Collaborat Res Ctr 475, Dept Stat, D-44221 Dortmund, Germany
关键词
feature selection; GENICA; single nucleotide polymorphism; variable importance measure;
D O I
10.1093/biostatistics/kxm024
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Interactions of single nucleotide polymorphisms (SNPs) are assumed to be responsible for complex diseases such as sporadic breast cancer. Important goals of studies concerned with such genetic data are thus to identify combinations of SNPs that lead to a higher risk of developing a disease and to measure the importance of these interactions. There are many approaches based on classification methods such as CART and random forests that allow measuring the importance of single variables. But none of these methods enable the importance of combinations of variables to be quantified directly. In this paper, we show how logic regression can be employed to identify SNP interactions explanatory for the disease status in a case-control study and propose 2 measures for quantifying the importance of these interactions for classification. These approaches are then applied on the one hand to simulated data sets and on the other hand to the SNP data of the GENICA study, a study dedicated to the identification of genetic and gene environment interactions associated with sporadic breast cancer.
引用
收藏
页码:187 / 198
页数:12
相关论文
共 15 条
[1]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[5]  
Garte S, 2001, CANCER EPIDEM BIOMAR, V10, P1233
[6]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[7]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[8]  
Justenhoven C, 2004, CANCER EPIDEM BIOMAR, V13, P2059
[9]   Identifying interacting SNPs using Monte Carlo logic regression [J].
Kooperberg, C ;
Ruczinski, I .
GENETIC EPIDEMIOLOGY, 2005, 28 (02) :157-170
[10]   Sequence analysis using logic regression [J].
Kooperberg, C ;
Ruczinski, I ;
LeBlanc, ML ;
Hsu, L .
GENETIC EPIDEMIOLOGY, 2001, 21 :S626-S631