Modular Regression: Improving Linear Models by Incorporating Auxiliary Data

被引:0
作者
Jin, Ying [1 ]
Rothenhausler, Dominik [1 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
关键词
Data fusion; high dimensional statistics; missing data; regression; semiparametric efficiency; surrogates; CONFIDENCE-INTERVALS; CLINICAL-TRIALS; END-POINTS; SURROGATE; SELECTION; PARAMETERS; LASSO; BIAS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper develops a new framework, called modular regression, to utilize auxiliary information - such as variables other than the original features or additional data sets - in the training process of linear models. At a high level, our method follows the routine: (i) decomposing the regression task into several sub-tasks, (ii) fitting the sub-task models, and (iii) using the sub-task models to provide an improved estimate for the original regression problem. This routine applies to widely-used low-dimensional (generalized) linear models and high-dimensional regularized linear regression. It also naturally extends to missing-data settings where only partial observations are available. By incorporating auxiliary information, our approach improves the estimation efficiency and prediction accuracy upon linear regression or the Lasso under a conditional independence assumption for predicting the outcome. For high-dimensional settings, we develop an extension of our procedure that is robust to violations of the conditional independence assumption, in the sense that it improves efficiency if this assumption holds and coincides with the Lasso otherwise. We demonstrate the efficacy of our methods with simulated and real data sets.
引用
收藏
页数:52
相关论文
共 70 条
  • [1] Sufficient dimension reduction and prediction in regression
    Adragni, Kofi P.
    Cook, R. Dennis
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2009, 367 (1906): : 4385 - 4405
  • [2] [Anonymous], 2010, Proposing the vote of thanks: Regression shrinkage and selection via the lasso: a retrospective by robert tibshirani
  • [3] Athey S., 2019, The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely
  • [4] Athey S, 2024, Arxiv, DOI arXiv:1603.09326
  • [5] Banerjee O., 2006, PROC 23 INT C MACH L, P89, DOI DOI 10.1145/1143844.1143856
  • [6] On the use of surrogate end points in randomized trials
    Begg, CB
    Leung, DHY
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2000, 163 : 15 - 24
  • [7] BICKEL P. J., 1993, Johns Hopkins Series in the Mathematical Sciences
  • [8] SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR
    Bickel, Peter J.
    Ritov, Ya'acov
    Tsybakov, Alexandre B.
    [J]. ANNALS OF STATISTICS, 2009, 37 (04) : 1705 - 1732
  • [9] Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
  • [10] Minimax rate-optimal estimation of high-dimensional covariance matrices with incomplete data
    Cai, T. Tony
    Zhang, Anru
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2016, 150 : 55 - 74