Uncertainty quantification in high-dimensional linear models incorporating graphical structures with applications to gene set analysis

被引:0
作者
Tan, Xiangyong [1 ]
Zhang, Xiao [2 ]
Cui, Yuehua [3 ]
Liu, Xu [4 ,5 ]
机构
[1] Jiangxi Univ Finance & Econ, Sch Stat & Data Sci, Nanchang, Peoples R China
[2] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China
[3] Michigan State Univ, Dept Stat & Probabil, E Lansing, MI 48823 USA
[4] Shanghai Univ Finance & Econ, Sch Stat & Management, Guoding Rd, Shanghai 200433, Peoples R China
[5] Yunnan Univ, Yunnan Key Lab Stat Modeling & Data Anal, Kunming 650500, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
VARIABLE SELECTION; CONFIDENCE-INTERVALS; P-VALUES; REGRESSION; LASSO; REGULARIZATION; SHRINKAGE;
D O I
10.1093/bioinformatics/btae541
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation The functions of genes in networks are typically correlated due to their functional connectivity. Variable selection methods have been developed to select important genes associated with a trait while incorporating network graphical information. However, no method has been proposed to quantify the uncertainty of individual genes under such settings.Results In this paper, we construct confidence intervals (CIs) and provide P-values for parameters of a high-dimensional linear model incorporating graphical structures where the number of variables p diverges with the number of observations. For combining the graphical information, we propose a graph-constrained desparsified LASSO (least absolute shrinkage and selection operator) (GCDL) estimator, which reduces dramatically the influence of high correlation of predictors and enjoys the advantage of faster computation and higher accuracy compared with the desparsified LASSO. Theoretical results show that the GCDL estimator achieves asymptotic normality. The asymptotic property of the uniform convergence is established, with which an explicit expression of the uniform CI can be derived. Extensive numerical results indicate that the GCDL estimator and its (uniform) CI perform well even when predictors are highly correlated.Availability and implementation An R package implementing the proposed method is available at https://github.com/XiaoZhangryy/gcdl.
引用
收藏
页数:11
相关论文
共 38 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].
Bickel, Peter J. ;
Ritov, Ya'acov ;
Tsybakov, Alexandre B. .
ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732
[3]   COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION [J].
Breheny, Patrick ;
Huang, Jian .
ANNALS OF APPLIED STATISTICS, 2011, 5 (01) :232-253
[4]   Statistical significance in high-dimensional linear models [J].
Buehlmann, Peter .
BERNOULLI, 2013, 19 (04) :1212-1242
[5]   Asymptotically honest confidence regions for high dimensional parameters by the desparsified conservative Lasso [J].
Caner, Mehmet ;
Kock, Anders Bredahl .
JOURNAL OF ECONOMETRICS, 2018, 203 (01) :143-168
[6]   A simple correction for multiple comparisons in interval mapping genome scans [J].
Cheverud, JM .
HEREDITY, 2001, 87 (1) :52-58
[7]   High-Dimensional Inference: Confidence Intervals, p-Values and R-Software hdi [J].
Dezeure, Ruben ;
Buehlmann, Peter ;
Meier, Lukas ;
Meinshausen, Nicolai .
STATISTICAL SCIENCE, 2015, 30 (04) :533-558
[8]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[9]  
Fan JQ, 2012, J ROY STAT SOC B, V74, P37, DOI 10.1111/j.1467-9868.2011.01005.x
[10]   Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360