Markov Neighborhood Regression for High-Dimensional Inference

被引:8
作者
Liang, Faming [1 ]
Xue, Jingnan [2 ]
Jia, Bochao [3 ]
机构
[1] Purdue Univ, Dept Stat, W Lafayette, IN 47906 USA
[2] Houzz, Palo Alto, CA USA
[3] Eli Lilly & Co, Lilly Corp Ctr, Indianapolis, IN 46285 USA
关键词
Causal structure discovery; Confidence interval; Gaussian graphical model; p-Value; POST-SELECTION INFERENCE; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; CONFIDENCE-INTERVALS; ADAPTIVE LASSO; BREAST-CANCER; LINEAR-MODELS; P-VALUES; DISCOVERY; RESISTANCE;
D O I
10.1080/01621459.2020.1841646
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article proposes an innovative method for constructing confidence intervals and assessing p-values in statistical inference for high-dimensional linear models. The proposed method has successfully broken the high-dimensional inference problem into a series of low-dimensional inference problems: For each regression coefficient beta(i), the confidence interval and p-value are computed by regressing on a subset of variables selected according to the conditional independence relations between the corresponding variable X-i and other variables. Since the subset of variables forms a Markov neighborhood of X-i in the Markov network formed by all the variables X-1, X-2, ... , X-p, the proposed method is coined as Markov neighborhood regression (MNR). The proposed method is tested on high-dimensional linear, logistic, and Cox regression. The numerical results indicate that the proposed method significantly outperforms the existing ones. Based on the MNR, a method of learning causal structures for high-dimensional linear models is proposed and applied to identification of drug sensitive genes and cancer driver genes. The idea of using conditional independence relations for dimension reduction is general and potentially can be extended to other high-dimensional or big data problems as well. Supplementary materials for this article are available online.
引用
收藏
页码:1200 / 1214
页数:15
相关论文
共 67 条
[1]  
Aliferis CF, 2010, J MACH LEARN RES, V11, P171
[2]   The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity [J].
Barretina, Jordi ;
Caponigro, Giordano ;
Stransky, Nicolas ;
Venkatesan, Kavitha ;
Margolin, Adam A. ;
Kim, Sungjoon ;
Wilson, Christopher J. ;
Lehar, Joseph ;
Kryukov, Gregory V. ;
Sonkin, Dmitriy ;
Reddy, Anupama ;
Liu, Manway ;
Murray, Lauren ;
Berger, Michael F. ;
Monahan, John E. ;
Morais, Paula ;
Meltzer, Jodi ;
Korejwa, Adam ;
Jane-Valbuena, Judit ;
Mapa, Felipa A. ;
Thibault, Joseph ;
Bric-Furlong, Eva ;
Raman, Pichai ;
Shipway, Aaron ;
Engels, Ingo H. ;
Cheng, Jill ;
Yu, Guoying K. ;
Yu, Jianjun ;
Aspesi, Peter, Jr. ;
de Silva, Melanie ;
Jagtap, Kalpana ;
Jones, Michael D. ;
Wang, Li ;
Hatton, Charles ;
Palescandolo, Emanuele ;
Gupta, Supriya ;
Mahan, Scott ;
Sougnez, Carrie ;
Onofrio, Robert C. ;
Liefeld, Ted ;
MacConaill, Laura ;
Winckler, Wendy ;
Reich, Michael ;
Li, Nanxin ;
Mesirov, Jill P. ;
Gabriel, Stacey B. ;
Getz, Gad ;
Ardlie, Kristin ;
Chan, Vivien ;
Myer, Vic E. .
NATURE, 2012, 483 (7391) :603-607
[3]   Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems [J].
Belloni, A. ;
Chernozhukov, V. ;
Kato, K. .
BIOMETRIKA, 2015, 102 (01) :77-94
[4]   VALID POST-SELECTION INFERENCE [J].
Berk, Richard ;
Brown, Lawrence ;
Buja, Andreas ;
Zhang, Kai ;
Zhao, Linda .
ANNALS OF STATISTICS, 2013, 41 (02) :802-837
[5]   Regulation of chemotactic and proadhesive responses to chemoattractant receptors by RGS (Regulator of G-protein Signaling) family members [J].
Bowman, EP ;
Campbell, JJ ;
Druey, KM ;
Scheschonka, A ;
Kehrl, JH ;
Butcher, EC .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1998, 273 (43) :28040-28048
[6]   Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm [J].
Buehlmann, P. ;
Kalisch, M. ;
Maathuis, M. H. .
BIOMETRIKA, 2010, 97 (02) :261-278
[7]   Statistical significance in high-dimensional linear models [J].
Buehlmann, Peter .
BERNOULLI, 2013, 19 (04) :1212-1242
[8]  
Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
[9]   Role of biologic therapy and chemotherapy in hormone receptor- and HER2-positive breast cancer [J].
Buzdar, A. U. .
ANNALS OF ONCOLOGY, 2009, 20 (06) :993-999
[10]   RATES OF CONVERGENCE OF THE ADAPTIVE LASSO ESTIMATORS TO THE ORACLE DISTRIBUTION AND HIGHER ORDER REFINEMENTS BY THE BOOTSTRAP [J].
Chatterjee, A. ;
Lahiri, S. N. .
ANNALS OF STATISTICS, 2013, 41 (03) :1232-1259