Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data

被引:2
作者
Won, Sungho [1 ]
Choi, Hosik [2 ]
Park, Suyeon [3 ,4 ]
Lee, Juyoung [4 ]
Park, Changyi [5 ]
Kwon, Sunghoon [6 ]
机构
[1] Seoul Natl Univ, Dept Publ Hlth Sci, Seoul, South Korea
[2] Kyonggi Univ, Dept Appl Informat Stat, Suwon, South Korea
[3] Soonchunhyang Univ, Coll Med, Dept Biostat, Seoul, South Korea
[4] Natl Inst Hlth, Ctr Genome Sci, Seoul, South Korea
[5] Univ Seoul, Dept Stat, Seoul, South Korea
[6] Konkuk Univ, Dept Appl Stat, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
GENOME-WIDE ASSOCIATION; VARIABLE SELECTION; RIDGE REGRESSION; DIVERGING NUMBER; ADAPTIVE LASSO; CLASSIFICATION; SHRINKAGE; RISK;
D O I
10.1155/2015/605891
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called "large P and small N" problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.
引用
收藏
页数:10
相关论文
共 49 条
[1]   Estimation of SNP Heritability from Dense Genotype Data [J].
不详 .
AMERICAN JOURNAL OF HUMAN GENETICS, 2013, 93 (06) :1151-1155
[2]  
Breiman L, 1996, ANN STAT, V24, P2350
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering [J].
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) :1084-1097
[5]   Bootstrapping Lasso Estimators [J].
Chatterjee, A. ;
Lahiri, S. N. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) :608-625
[6]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[7]   Power and Predictive Accuracy of Polygenic Risk Scores [J].
Dudbridge, Frank .
PLOS GENETICS, 2013, 9 (03)
[8]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[9]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[10]   Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk [J].
Evans, David M. ;
Visscher, Peter M. ;
Wray, Naomi R. .
HUMAN MOLECULAR GENETICS, 2009, 18 (18) :3525-3531