Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

被引:0
|
作者
Randy Carter
Netsanet Michael
机构
[1] State University of New York at Buffalo,Department of Biostatistics
[2] The Boeing Company,Boeing Commercial Airplanes
来源
关键词
Bilinear factor model; Principal component analysis; Principal component regression; Partial least squares; Factor structure covariance matrix; Factor analysis regression; Mean square error of prediction; Monte Carlo studies; Cross-validation;
D O I
暂无
中图分类号
学科分类号
摘要
Factor analysis regression (FAR) of yi\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y _i$$\end{document} on xi=(x1i,x2i,…,xpi)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{x}}}_i=(x _{1i},x _{2i},\ldots ,x _{pi})$$\end{document}, i = 1,2,...,n, has been studied only in the low-dimensional case (p<n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p < n )$$\end{document}, using maximum likelihood (ML) factor extraction. The ML method breaks down in high-dimensional cases (p>n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p >n )$$\end{document}. In this paper, we develop a high-dimensional version of FAR based on a computationally efficient method of factor extraction. We compare the performance of our high-dimensional FAR with partial least squares regression (PLSR) and principal component regression (PCR) under three underlying correlation structures: arbitrary correlation, factor model correlation structure, and when y is independent of x. Under each structure, we generated Monte Carlo training samples of sizes n<p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n <p$$\end{document} from a multivariate normal distribution with each structure. Parameters were fixed at estimates obtained from analyses of real data sets. Given the independence structure, we observed severe over-fitting by PLSR compared to FAR and PCR. Under the two dependent structures, FAR had a notably better average mean square error of prediction than PCR. The performance of FAR and PLSR were not notably different given the dependent structures. Thus, overall, FAR performed better than either PLSR or PCR.
引用
收藏
页码:115 / 132
页数:17
相关论文
共 50 条
  • [41] High-dimensional data analysis and visualisation
    Cathy W. S. Chen
    Rosaria Lombardo
    Enrico Ripamonti
    Computational Statistics, 2024, 39 : 1 - 2
  • [42] Procrustes Analysis for High-Dimensional Data
    Angela Andreella
    Livio Finos
    Psychometrika, 2022, 87 : 1422 - 1438
  • [43] An Application of High-Dimensional Statistics to Predictive Modeling of Grade Variability
    Hinz, Juni
    Grigoryev, Igor
    Novikov, Alexander
    GEOSCIENCES, 2020, 10 (04)
  • [44] A double regression method for graphical modeling of high-dimensional nonlinear and non-Gaussian data
    Liang, Siqi
    Liang, Faming
    STATISTICS AND ITS INTERFACE, 2024, 17 (04) : 669 - 680
  • [45] Regression on High-dimensional Inputs
    Kuleshov, Alexander
    Bernstein, Alexander
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 732 - 739
  • [46] Factor Modeling for Clustering High-Dimensional Time Series
    Zhang, Bo
    Pan, Guangming
    Yao, Qiwei
    Zhou, Wang
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1252 - 1263
  • [47] On inference in high-dimensional regression
    Battey, Heather S.
    Reid, Nancy
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2023, 85 (01) : 149 - 175
  • [48] Bayesian high-dimensional regression for change point analysis
    Datta, Abhirup
    Zou, Hui
    Banerjee, Sudipto
    STATISTICS AND ITS INTERFACE, 2019, 12 (02) : 253 - 264
  • [49] High-dimensional analysis of variance in multivariate linear regression
    Lou, Zhipeng
    Zhang, Xianyang
    Wu, Wei Biao
    BIOMETRIKA, 2023, 110 (03) : 777 - 797
  • [50] ASYMPTOTIC ANALYSIS OF HIGH-DIMENSIONAL LAD REGRESSION WITH LASSO
    Gao, Xiaoli
    Huang, Jian
    STATISTICA SINICA, 2010, 20 (04) : 1485 - 1506