Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

被引:0
|
作者
Randy Carter
Netsanet Michael
机构
[1] State University of New York at Buffalo,Department of Biostatistics
[2] The Boeing Company,Boeing Commercial Airplanes
来源
Journal of Quantitative Economics | 2022年 / 20卷
关键词
Bilinear factor model; Principal component analysis; Principal component regression; Partial least squares; Factor structure covariance matrix; Factor analysis regression; Mean square error of prediction; Monte Carlo studies; Cross-validation;
D O I
暂无
中图分类号
学科分类号
摘要
Factor analysis regression (FAR) of yi\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y _i$$\end{document} on xi=(x1i,x2i,…,xpi)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{x}}}_i=(x _{1i},x _{2i},\ldots ,x _{pi})$$\end{document}, i = 1,2,...,n, has been studied only in the low-dimensional case (p<n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p < n )$$\end{document}, using maximum likelihood (ML) factor extraction. The ML method breaks down in high-dimensional cases (p>n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p >n )$$\end{document}. In this paper, we develop a high-dimensional version of FAR based on a computationally efficient method of factor extraction. We compare the performance of our high-dimensional FAR with partial least squares regression (PLSR) and principal component regression (PCR) under three underlying correlation structures: arbitrary correlation, factor model correlation structure, and when y is independent of x. Under each structure, we generated Monte Carlo training samples of sizes n<p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n <p$$\end{document} from a multivariate normal distribution with each structure. Parameters were fixed at estimates obtained from analyses of real data sets. Given the independence structure, we observed severe over-fitting by PLSR compared to FAR and PCR. Under the two dependent structures, FAR had a notably better average mean square error of prediction than PCR. The performance of FAR and PLSR were not notably different given the dependent structures. Thus, overall, FAR performed better than either PLSR or PCR.
引用
收藏
页码:115 / 132
页数:17
相关论文
共 50 条
  • [41] Federated singular value decomposition for high-dimensional data
    Hartebrodt, Anne
    Rottger, Richard
    Blumenthal, David B.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (03) : 938 - 975
  • [42] Representation and classification of high-dimensional biomedical spectral data
    W. Pedrycz
    D. J. Lee
    N. J. Pizzi
    Pattern Analysis and Applications, 2010, 13 : 423 - 436
  • [43] Simultaneous multiple change-point and factor analysis for high-dimensional time series
    Barigozzi, Matteo
    Cho, Haeran
    Fryzlewicz, Piotr
    JOURNAL OF ECONOMETRICS, 2018, 206 (01) : 187 - 225
  • [44] Asymptotic performance of PCA for high-dimensional heteroscedastic data
    Hong, David
    Balzano, Laura
    Fessler, Jeffrey A.
    JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 167 : 435 - 452
  • [45] Constructing metabolic association networks using high-dimensional mass spectrometry data
    Koo, Imhoi
    Wei, Xiaoli
    Shi, Xue
    Zhou, Zhanxiang
    Kim, Seongho
    Zhang, Xiang
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2014, 138 : 193 - 202
  • [46] Adaptive Dimensionality Reduction Method for High-dimensional Data
    Duan, Shuyong
    Yang, Jianhua
    Han, Xu
    Liu, Guirong
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2024, 60 (17): : 283 - 296
  • [47] Development of biomarker classifiers from high-dimensional data
    Baek, Songjoon
    Tsai, Chen-An
    Chen, James J.
    BRIEFINGS IN BIOINFORMATICS, 2009, 10 (05) : 537 - 546
  • [48] Validation and data splitting in predictive regression modeling of honing surface roughness data
    Feng, CXJ
    Yu, ZGS
    Wang, JHJ
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2005, 43 (08) : 1555 - 1571
  • [49] Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data
    Simon, Richard M.
    Subramanian, Jyothi
    Li, Ming-Chung
    Menezes, Supriya
    BRIEFINGS IN BIOINFORMATICS, 2011, 12 (03) : 203 - 214
  • [50] Federated singular value decomposition for high-dimensional data
    Anne Hartebrodt
    Richard Röttger
    David B. Blumenthal
    Data Mining and Knowledge Discovery, 2024, 38 : 938 - 975