Inference in High-Dimensional Multivariate Response Regression with Hidden Variables

被引:2
作者
Bing, Xin [1 ]
Cheng, Wei [2 ]
Feng, Huijie [3 ]
Ning, Yang [4 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
[2] Brown Univ, Ctr Computat Mol Biol, Providence, RI 02912 USA
[3] Microsoft, Bellevue, WA USA
[4] Cornell Univ, Dept Stat & Data Sci, Ithaca, NY 14850 USA
关键词
Confidence intervals; Confounding; Hidden variables; High-dimensional regression; Hypothesis testing; Multivariate response regression; Surrogate variable analysis; CONFIDENCE-REGIONS; NUMBER; INTERVALS; SELECTION; MODELS; TESTS;
D O I
10.1080/01621459.2023.2241701
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article studies the inference of the regression coefficient matrix under multivariate response linear regressions in the presence of hidden variables. A novel procedure for constructing confidence intervals of entries of the coefficient matrix is proposed. Our method first uses the multivariate nature of the responses by estimating and adjusting the hidden effect to construct an initial estimator of the coefficient matrix. By further deploying a low-dimensional projection procedure to reduce the bias introduced by the regularization in the previous step, a refined estimator is proposed and shown to be asymptotically normal. The asymptotic variance of the resulting estimator is derived with closed-form expression and can be consistently estimated. In addition, we propose a testing procedure for the existence of hidden effects and provide its theoretical justification. Both our procedures and their analyses are valid even when the feature dimension and the number of responses exceed the sample size. Our results are further backed up via extensive simulations and a real data analysis. Supplementary materials for this article are available online.
引用
收藏
页码:2066 / 2077
页数:12
相关论文
共 43 条
  • [1] Eigenvalue Ratio Test for the Number of Factors
    Ahn, Seung C.
    Horenstein, Alex R.
    [J]. ECONOMETRICA, 2013, 81 (03) : 1203 - 1227
  • [2] Anderson T. W., 1984, WILEY SERIES PROBABI
  • [3] Inferential theory for factor models of large dimensions.
    Bai, J
    [J]. ECONOMETRICA, 2003, 71 (01) : 135 - 171
  • [4] Forecasting economic time series using targeted predictors
    Bai, Jushan
    Ng, Serena
    [J]. JOURNAL OF ECONOMETRICS, 2008, 146 (02) : 304 - 317
  • [5] Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems
    Belloni, A.
    Chernozhukov, V.
    Kato, K.
    [J]. BIOMETRIKA, 2015, 102 (01) : 77 - 94
  • [6] ADAPTIVE ESTIMATION IN MULTIVARIATE RESPONSE REGRESSION WITH HIDDEN VARIABLES
    Bing, Xin
    Ning, Yang
    Xu, Yaosheng
    [J]. ANNALS OF STATISTICS, 2022, 50 (02) : 640 - 672
  • [7] MGD: the Mouse Genome Database
    Blake, JA
    Richardson, JE
    Bult, RJ
    Kadin, JA
    Eppig, JT
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 193 - 195
  • [8] TWO-STEP SEMIPARAMETRIC EMPIRICAL LIKELIHOOD INFERENCE
    Bravo, Francesco
    Carlos Escanciano, Juan
    Van Keilegom, Ingrid
    [J]. ANNALS OF STATISTICS, 2020, 48 (01) : 1 - 26
  • [9] Robust Principal Component Analysis?
    Candes, Emmanuel J.
    Li, Xiaodong
    Ma, Yi
    Wright, John
    [J]. JOURNAL OF THE ACM, 2011, 58 (03)
  • [10] LATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION
    Chandrasekaran, Venkat
    Parrilo, Pablo A.
    Willsky, Alan S.
    [J]. ANNALS OF STATISTICS, 2012, 40 (04) : 1935 - 1967