Selecting the Number of Principal Components in Functional Data

被引:80
作者
Li, Yehua [1 ]
Wang, Naisyin [2 ]
Carroll, Raymond J. [3 ]
机构
[1] Iowa State Univ, Dept Stat & Stat Lab, Ames, IA 50011 USA
[2] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
[3] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
基金
美国国家科学基金会;
关键词
Akaike information criterion; Bayesian information criterion; Functional data analysis; Kernel smoothing; Model selection; COLON CARCINOGENESIS; LONGITUDINAL DATA; NONPARAMETRIC REGRESSION; LINEAR-REGRESSION; MODELS; DIMENSIONALITY; REDUCTION; CURVES;
D O I
10.1080/01621459.2013.788980
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Functional principal component analysis (FPCA) has become the most widely used dimension reduction tool for functional data analysis. We consider functional data measured at random, subject-specific time points, contaminated with measurement error, allowing for both sparse and dense functional data, and propose novel information criteria to select the number of principal component in such data. We propose a Bayesian information criterion based on marginal modeling that can consistently select the number of principal components for both sparse and dense functional data. For dense functional data, we also develop an Akaike information criterion based on the expected Kullback-Leibler information under a Gaussian assumption. In connecting with the time series literature, we also consider a class of information criteria proposed for factor analysis of multivariate time series and show that they are still consistent for dense functional data, if a prescribed undersmoothing scheme is undertaken in the FPCA algorithm. We perform intensive simulation studies and show that the proposed information criteria vastly outperform existing methods for this type of data. Surprisingly, our empirical evidence shows that our information criteria proposed for dense functional data also perform well for sparse functional data. An empirical example using colon carcinogenesis data is also provided to illustrate the results. Supplementary materials for this article are available online.
引用
收藏
页码:1284 / 1294
页数:11
相关论文
共 24 条
  • [1] [Anonymous], 2006, Model selection and model averaging, DOI DOI 10.1017/CBO9780511790485.003
  • [2] Determining the number of factors in approximate factor models
    Bai, JS
    Ng, S
    [J]. ECONOMETRICA, 2002, 70 (01) : 191 - 221
  • [3] Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogenesis
    Baladandayuthapani, Veerabhadran
    Mallick, Bani K.
    Hong, Mee Young
    Lupton, Joanne R.
    Turner, Nancy D.
    Carroll, Raymond J.
    [J]. BIOMETRICS, 2008, 64 (01) : 64 - 73
  • [4] Prediction in functional linear regression
    Cai, T. Tony
    Hall, Peter
    [J]. ANNALS OF STATISTICS, 2006, 34 (05) : 2159 - 2179
  • [5] Capra WB, 1997, J AM STAT ASSOC, V92, P72
  • [6] On properties of functional principal components analysis
    Hall, P
    Hosseini-Nasab, M
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2006, 68 : 109 - 126
  • [7] Assessing the finite dimensionality of functional data
    Hall, Peter
    Vial, Celine
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2006, 68 : 689 - 705
  • [8] Properties of principal component methods for functional and longitudinal data analysis
    Hall, Peter
    Mueller, Hans-Georg
    Wang, Jane-Ling
    [J]. ANNALS OF STATISTICS, 2006, 34 (03) : 1493 - 1517
  • [9] Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion
    Hurvich, CM
    Simonoff, JS
    Tsai, CL
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1998, 60 : 271 - 293
  • [10] REGRESSION AND TIME-SERIES MODEL SELECTION IN SMALL SAMPLES
    HURVICH, CM
    TSAI, CL
    [J]. BIOMETRIKA, 1989, 76 (02) : 297 - 307