Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

被引：0

作者：

Randy Carter

Netsanet Michael

机构：

[1] State University of New York at Buffalo,Department of Biostatistics

[2] The Boeing Company,Boeing Commercial Airplanes

来源：

Journal of Quantitative Economics | 2022年 / 20卷

关键词：

Bilinear factor model; Principal component analysis; Principal component regression; Partial least squares; Factor structure covariance matrix; Factor analysis regression; Mean square error of prediction; Monte Carlo studies; Cross-validation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Factor analysis regression (FAR) of yi\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y _i$$\end{document} on xi=(x1i,x2i,…,xpi)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{x}}}_i=(x _{1i},x _{2i},\ldots ,x _{pi})$$\end{document}, i = 1,2,...,n, has been studied only in the low-dimensional case (p<n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p < n )$$\end{document}, using maximum likelihood (ML) factor extraction. The ML method breaks down in high-dimensional cases (p>n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p >n )$$\end{document}. In this paper, we develop a high-dimensional version of FAR based on a computationally efficient method of factor extraction. We compare the performance of our high-dimensional FAR with partial least squares regression (PLSR) and principal component regression (PCR) under three underlying correlation structures: arbitrary correlation, factor model correlation structure, and when y is independent of x. Under each structure, we generated Monte Carlo training samples of sizes n<p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n <p$$\end{document} from a multivariate normal distribution with each structure. Parameters were fixed at estimates obtained from analyses of real data sets. Given the independence structure, we observed severe over-fitting by PLSR compared to FAR and PCR. Under the two dependent structures, FAR had a notably better average mean square error of prediction than PCR. The performance of FAR and PLSR were not notably different given the dependent structures. Thus, overall, FAR performed better than either PLSR or PCR.

引用

页码：115 / 132

页数：17

共 50 条

[41] Federated singular value decomposition for high-dimensional data
Hartebrodt, Anne
Rottger, Richard
Blumenthal, David B.
DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (03) : 938 - 975
[42] Representation and classification of high-dimensional biomedical spectral data
W. Pedrycz
D. J. Lee
N. J. Pizzi
Pattern Analysis and Applications, 2010, 13 : 423 - 436
[43] Simultaneous multiple change-point and factor analysis for high-dimensional time series
Barigozzi, Matteo
Cho, Haeran
Fryzlewicz, Piotr
JOURNAL OF ECONOMETRICS, 2018, 206 (01) : 187 - 225
[44] Asymptotic performance of PCA for high-dimensional heteroscedastic data
Hong, David
Balzano, Laura
Fessler, Jeffrey A.
JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 167 : 435 - 452
[45] Constructing metabolic association networks using high-dimensional mass spectrometry data
Koo, Imhoi
Wei, Xiaoli
Shi, Xue
Zhou, Zhanxiang
Kim, Seongho
Zhang, Xiang
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2014, 138 : 193 - 202
[46] Adaptive Dimensionality Reduction Method for High-dimensional Data
Duan, Shuyong
Yang, Jianhua
Han, Xu
Liu, Guirong
Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2024, 60 (17): : 283 - 296
[47] Development of biomarker classifiers from high-dimensional data
Baek, Songjoon
Tsai, Chen-An
Chen, James J.
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (05) : 537 - 546
[48] Validation and data splitting in predictive regression modeling of honing surface roughness data
Feng, CXJ
Yu, ZGS
Wang, JHJ
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2005, 43 (08) : 1555 - 1571
[49] Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data
Simon, Richard M.
Subramanian, Jyothi
Li, Ming-Chung
Menezes, Supriya
BRIEFINGS IN BIOINFORMATICS, 2011, 12 (03) : 203 - 214
[50] Federated singular value decomposition for high-dimensional data
Anne Hartebrodt
Richard Röttger
David B. Blumenthal
Data Mining and Knowledge Discovery, 2024, 38 : 938 - 975

← 1 2 3 4 5 →