MWPCR: Multiscale Weighted Principal Component Regression for High-Dimensional Prediction

被引:8
|
作者
Zhu, Hongtu [1 ,2 ]
Shen, Dan [3 ,4 ]
Peng, Xuewei [5 ]
Liu, Leo Yufeng [1 ,6 ]
机构
[1] Univ Texas MD Anderson Canc Ctr, Dept Biostat, Houston, TX 77030 USA
[2] Univ N Carolina, Dept Biostat, Chapel Hill, NC USA
[3] Univ S Florida, Interdisciplinary Data Sci Consortium, Tampa, FL USA
[4] Univ S Florida, Dept Math & Stat, Tampa, FL USA
[5] Texas A&M Univ, College Stn, TX USA
[6] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC USA
关键词
Alzheimer; Feature; Principal component analysis; Regression; Spatial; Supervised; MODELS; CLASSIFICATION; VARIABLES; TUTORIAL;
D O I
10.1080/01621459.2016.1261710
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a multiscale weighted principal component regression (MWPCR) framework for the use of high-dimensional features with strong spatial features (e.g., smoothness and correlation) to predict an outcome variable, such as disease status. This development is motivated by identifying imaging biomarkers that could potentially aid detection, diagnosis, assessment of prognosis, prediction of response to treatment, and monitoring of disease status, among many others. The MWPCR can be regarded as a novel integration of principal components analysis (PCA), kernel methods, and regression models. In MWPCR, we introduce various weight matrices to prewhitten high-dimensional feature vectors, perform matrix decomposition for both dimension reduction and feature extraction, and build a prediction model by using the extracted features. Examples of such weight matrices include an importance score weight matrix for the selection of individual features at each location and a spatial weight matrix for the incorporation of the spatial pattern of feature vectors. We integrate the importance of score weights with the spatial weights to recover the low-dimensional structure of high-dimensional features. We demonstrate the utility of our methods through extensive simulations and real data analyses of the Alzheimer's disease neuroimaging initiative (ADNI) dataset. Supplementary materials for this article are available online.
引用
收藏
页码:1009 / 1021
页数:13
相关论文
共 50 条
  • [1] CONVERGENCE AND PREDICTION OF PRINCIPAL COMPONENT SCORES IN HIGH-DIMENSIONAL SETTINGS
    Lee, Seunggeun
    Zou, Fei
    Wright, Fred A.
    ANNALS OF STATISTICS, 2010, 38 (06): : 3605 - 3629
  • [2] On principal component analysis for high-dimensional XCSR
    Behdad, Mohammad
    French, Tim
    Barone, Luigi
    Bennamoun, Mohammed
    EVOLUTIONARY INTELLIGENCE, 2012, 5 (02) : 129 - 138
  • [3] Forecasting High-Dimensional Covariance Matrices Using High-Dimensional Principal Component Analysis
    Shigemoto, Hideto
    Morimoto, Takayuki
    AXIOMS, 2022, 11 (12)
  • [4] Principal component analysis for sparse high-dimensional data
    Raiko, Tapani
    Ilin, Alexander
    Karhunen, Juha
    NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
  • [5] High-dimensional principal component analysis with heterogeneous missingness
    Zhu, Ziwei
    Wang, Tengyao
    Samworth, Richard J.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (05) : 2000 - 2031
  • [6] Test for high-dimensional outliers with principal component analysis
    Nakayama, Yugo
    Yata, Kazuyoshi
    Aoshima, Makoto
    JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2024, 7 (02) : 739 - 766
  • [7] PRINCIPAL COMPONENT ANALYSIS IN VERY HIGH-DIMENSIONAL SPACES
    Lee, Young Kyung
    Lee, Eun Ryung
    Park, Byeong U.
    STATISTICA SINICA, 2012, 22 (03) : 933 - 956
  • [8] Efficient high-dimensional indexing by sorting principal component
    Cui, Jiangtao
    Zhou, Shuisheng
    Sun, Junding
    PATTERN RECOGNITION LETTERS, 2007, 28 (16) : 2412 - 2418
  • [9] Prediction in abundant high-dimensional linear regression
    Cook, R. Dennis
    Forzani, Liliana
    Rothman, Adam J.
    ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 3059 - 3088
  • [10] Bayesian regression based on principal components for high-dimensional data
    Lee, Jaeyong
    Oh, Hee-Seok
    JOURNAL OF MULTIVARIATE ANALYSIS, 2013, 117 : 175 - 192