Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

被引:0
|
作者
Randy Carter
Netsanet Michael
机构
[1] State University of New York at Buffalo,Department of Biostatistics
[2] The Boeing Company,Boeing Commercial Airplanes
来源
Journal of Quantitative Economics | 2022年 / 20卷
关键词
Bilinear factor model; Principal component analysis; Principal component regression; Partial least squares; Factor structure covariance matrix; Factor analysis regression; Mean square error of prediction; Monte Carlo studies; Cross-validation;
D O I
暂无
中图分类号
学科分类号
摘要
Factor analysis regression (FAR) of yi\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y _i$$\end{document} on xi=(x1i,x2i,…,xpi)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{x}}}_i=(x _{1i},x _{2i},\ldots ,x _{pi})$$\end{document}, i = 1,2,...,n, has been studied only in the low-dimensional case (p<n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p < n )$$\end{document}, using maximum likelihood (ML) factor extraction. The ML method breaks down in high-dimensional cases (p>n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p >n )$$\end{document}. In this paper, we develop a high-dimensional version of FAR based on a computationally efficient method of factor extraction. We compare the performance of our high-dimensional FAR with partial least squares regression (PLSR) and principal component regression (PCR) under three underlying correlation structures: arbitrary correlation, factor model correlation structure, and when y is independent of x. Under each structure, we generated Monte Carlo training samples of sizes n<p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n <p$$\end{document} from a multivariate normal distribution with each structure. Parameters were fixed at estimates obtained from analyses of real data sets. Given the independence structure, we observed severe over-fitting by PLSR compared to FAR and PCR. Under the two dependent structures, FAR had a notably better average mean square error of prediction than PCR. The performance of FAR and PLSR were not notably different given the dependent structures. Thus, overall, FAR performed better than either PLSR or PCR.
引用
收藏
页码:115 / 132
页数:17
相关论文
共 50 条
  • [31] Cauchy robust principal component analysis with applications to high-dimensional data sets
    Aisha Fayomi
    Yannis Pantazis
    Michail Tsagris
    Andrew T. A. Wood
    Statistics and Computing, 2024, 34
  • [32] On the Modeling and Prediction of High-Dimensional Functional Time Series
    Chang, Jinyuan
    Fang, Qin
    Qiao, Xinghao
    Yao, Qiwei
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,
  • [33] Canonical correlation analysis of high-dimensional data with very small sample support
    Song, Yang
    Schreier, Peter J.
    Ramirez, David
    Hasija, Tanuj
    SIGNAL PROCESSING, 2016, 128 : 449 - 458
  • [34] Factor Models for High-Dimensional Tensor Time Series
    Chen, Rong
    Yang, Dan
    Zhang, Cun-Hui
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (537) : 94 - 116
  • [35] Robust Covariance Matrix Estimation for High-Dimensional Compositional Data with Application to Sales Data Analysis
    Li, Danning
    Srinivasan, Arun
    Chen, Qian
    Xue, Lingzhou
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2023, 41 (04) : 1090 - 1100
  • [36] Improved discriminate analysis for high-dimensional data and its application to face recognition
    Zhuang, Xiao-Sheng
    Dai, Dao-Qing
    PATTERN RECOGNITION, 2007, 40 (05) : 1570 - 1578
  • [37] High-dimensional covariance forecasting based on principal component analysis of high-frequency data
    Jian, Zhihong
    Deng, Pingjun
    Zhu, Zhican
    ECONOMIC MODELLING, 2018, 75 : 422 - 431
  • [38] Predictive Deep Learning for High-Dimensional Inverse Modeling of Hydraulic Tomography in Gaussian and Non-Gaussian Fields
    Guo, Quan
    Liu, Ming
    Luo, Jian
    WATER RESOURCES RESEARCH, 2023, 59 (10)
  • [39] MWPCR: Multiscale Weighted Principal Component Regression for High-Dimensional Prediction
    Zhu, Hongtu
    Shen, Dan
    Peng, Xuewei
    Liu, Leo Yufeng
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (519) : 1009 - 1021
  • [40] Representation and classification of high-dimensional biomedical spectral data
    Pedrycz, W.
    Lee, D. J.
    Pizzi, N. J.
    PATTERN ANALYSIS AND APPLICATIONS, 2010, 13 (04) : 423 - 436