Stable prediction in high-dimensional linear models

被引:18
作者
Lin, Bingqing [1 ]
Wang, Qihua [1 ,2 ]
Zhang, Jun [1 ]
Pang, Zhen [3 ]
机构
[1] Shenzhen Univ, Inst Stat Sci, Coll Math & Stat, Shenzhen 518060, Peoples R China
[2] Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
[3] Hong Kong Polytech Univ, Dept Appl Math, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Model averaging; Variable selection; Penalized regression; Screening; VARIABLE SELECTION; REGRESSION;
D O I
10.1007/s11222-016-9694-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a Random Splitting Model Averaging procedure, RSMA, to achieve stable predictions in high-dimensional linear models. The idea is to use split training data to construct and estimate candidate models and use test data to form a second-level data. The second-level data is used to estimate optimal weights for candidate models by quadratic optimization under non-negative constraints. This procedure has three appealing features: (1) RSMA avoids model overfitting, as a result, gives improved prediction accuracy. (2) By adaptively choosing optimal weights, we obtain more stable predictions via averaging over several candidate models. (3) Based on RSMA, a weighted importance index is proposed to rank the predictors to discriminate relevant predictors from irrelevant ones. Simulation studies and a real data analysis demonstrate that RSMA procedure has excellent predictive performance and the associated weighted importance index could well rank the predictors.
引用
收藏
页码:1401 / 1412
页数:12
相关论文
共 50 条
  • [41] Stabilizing High-Dimensional Prediction Models Using Feature Graphs
    Gopakumar, Shivapratap
    Truyen Tran
    Tu Dinh Nguyen
    Dinh Phung
    Venkatesh, Svetha
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2015, 19 (03) : 1044 - 1052
  • [42] Linear screening for high-dimensional computer experiments
    Li, Chunya
    Chen, Daijun
    Xiong, Shifeng
    STAT, 2021, 10 (01):
  • [43] Partial profile score feature selection in high-dimensional generalized linear interaction models
    Xu, Zengchao
    Luo, Shan
    Chen, Zehua
    STATISTICS AND ITS INTERFACE, 2022, 15 (04) : 433 - 447
  • [44] Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models
    Veerman, Jurre R.
    Leday, Gwenael G. R.
    van de Wiel, Mark A.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (01) : 116 - 134
  • [45] An iterative matrix uncertainty selector for high-dimensional generalized linear models with measurement errors
    Fesuh Nono, Betrand
    Nguefack-Tsague, Georges
    Kegnenlezom, Martin
    Nguema, Eugene-Patrice N.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2025,
  • [46] SEPARATION OF COVARIATES INTO NONPARAMETRIC AND PARAMETRIC PARTS IN HIGH-DIMENSIONAL PARTIALLY LINEAR ADDITIVE MODELS
    Lian, Heng
    Liang, Hua
    Ruppert, David
    STATISTICA SINICA, 2015, 25 (02) : 591 - 607
  • [47] The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data
    He, Yawei
    Chen, Zehua
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2016, 68 (01) : 155 - 180
  • [48] Linear Hypothesis Testing in Dense High-Dimensional Linear Models
    Zhu, Yinchu
    Bradic, Jelena
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (524) : 1583 - 1600
  • [49] Group inference for high-dimensional mediation models
    Yu, Ke
    Guo, Xu
    Luo, Shan
    STATISTICS AND COMPUTING, 2025, 35 (03)
  • [50] Introduction to variational Bayes for high-dimensional linear and logistic regression models
    Jang, Insong
    Lee, Kyoungjae
    KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (03) : 445 - 455