Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

被引:0
|
作者
Randy Carter
Netsanet Michael
机构
[1] State University of New York at Buffalo,Department of Biostatistics
[2] The Boeing Company,Boeing Commercial Airplanes
来源
Journal of Quantitative Economics | 2022年 / 20卷
关键词
Bilinear factor model; Principal component analysis; Principal component regression; Partial least squares; Factor structure covariance matrix; Factor analysis regression; Mean square error of prediction; Monte Carlo studies; Cross-validation;
D O I
暂无
中图分类号
学科分类号
摘要
Factor analysis regression (FAR) of yi\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y _i$$\end{document} on xi=(x1i,x2i,…,xpi)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{x}}}_i=(x _{1i},x _{2i},\ldots ,x _{pi})$$\end{document}, i = 1,2,...,n, has been studied only in the low-dimensional case (p<n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p < n )$$\end{document}, using maximum likelihood (ML) factor extraction. The ML method breaks down in high-dimensional cases (p>n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p >n )$$\end{document}. In this paper, we develop a high-dimensional version of FAR based on a computationally efficient method of factor extraction. We compare the performance of our high-dimensional FAR with partial least squares regression (PLSR) and principal component regression (PCR) under three underlying correlation structures: arbitrary correlation, factor model correlation structure, and when y is independent of x. Under each structure, we generated Monte Carlo training samples of sizes n<p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n <p$$\end{document} from a multivariate normal distribution with each structure. Parameters were fixed at estimates obtained from analyses of real data sets. Given the independence structure, we observed severe over-fitting by PLSR compared to FAR and PCR. Under the two dependent structures, FAR had a notably better average mean square error of prediction than PCR. The performance of FAR and PLSR were not notably different given the dependent structures. Thus, overall, FAR performed better than either PLSR or PCR.
引用
收藏
页码:115 / 132
页数:17
相关论文
共 50 条
  • [1] Factor Analysis Regression for Predictive Modeling with High-Dimensional Data
    Carter, Randy
    Michael, Netsanet
    JOURNAL OF QUANTITATIVE ECONOMICS, 2022, 20 (SUPPL 1) : 115 - 132
  • [2] Subspace clustering of high-dimensional data: a predictive approach
    McWilliams, Brian
    Montana, Giovanni
    DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (03) : 736 - 772
  • [3] An Application of High-Dimensional Statistics to Predictive Modeling of Grade Variability
    Hinz, Juni
    Grigoryev, Igor
    Novikov, Alexander
    GEOSCIENCES, 2020, 10 (04)
  • [4] OPTIMAL DISCRIMINANT ANALYSIS IN HIGH-DIMENSIONAL LATENT FACTOR MODELS
    Bing, Xin
    Wegkamp, Marten
    ANNALS OF STATISTICS, 2023, 51 (03) : 1232 - 1257
  • [5] Classification of High-Dimensional Data with Ensemble of Logistic Regression Models
    Lim, Noha
    Ahn, Hongshik
    Moon, Hojin
    Chen, James J.
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2010, 20 (01) : 160 - 171
  • [6] High-Dimensional Regression with Unknown Variance
    Giraud, Christophe
    Huet, Sylvie
    Verzelen, Nicolas
    STATISTICAL SCIENCE, 2012, 27 (04) : 500 - 518
  • [7] Obtaining insights from high-dimensional data: sparse principal covariates regression
    Van Deun, Katrijn
    Crompvoets, Elise A. V.
    Ceulemans, Eva
    BMC BIOINFORMATICS, 2018, 19
  • [8] Estimation of predictive performance in high-dimensional data settings using learning curves
    Goedhart, Jeroen M.
    Klausch, Thomas
    van de Wiel, Mark A.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 180
  • [9] Categorical Data Analysis for High-Dimensional Sparse Gene Expression Data
    Dousti Mousavi, Niloufar
    Aldirawi, Hani
    Yang, Jie
    BIOTECH, 2023, 12 (03):
  • [10] Improved two-stage model averaging for high-dimensional linear regression, with application to Riboflavin data analysis
    Pan, Juming
    BMC BIOINFORMATICS, 2021, 22 (01)