Principal component analysis with missing values: a comparative survey of methods

被引:0
|
作者
Stéphane Dray
Julie Josse
机构
[1] Université de Lyon,Applied Mathematics Department
[2] Université Lyon 1,undefined
[3] CNRS,undefined
[4] UMR5558,undefined
[5] Laboratoire de Biométrie et Biologie Evolutive,undefined
[6] Agrocampus Ouest,undefined
来源
Plant Ecology | 2015年 / 216卷
关键词
Imputation; Ordination; PCA; Traits;
D O I
暂无
中图分类号
学科分类号
摘要
Principal component analysis (PCA) is a standard technique to summarize the main structures of a data table containing the measurements of several quantitative variables for a number of individuals. Here, we study the case where some of the data values are missing and propose a review of methods which accommodate PCA to missing data. In plant ecology, this statistical challenge relates to the current effort to compile global plant functional trait databases producing matrices with a large amount of missing values. We present several techniques to consider or estimate (impute) missing values in PCA and compare them using theoretical considerations. We carried out a simulation study to evaluate the relative merits of the different approaches in various situations (correlation structure, number of variables and individuals, and percentage of missing values) and also applied them on a real data set. Lastly, we discuss the advantages and drawbacks of these approaches, the potential pitfalls and future challenges that need to be addressed in the future.
引用
收藏
页码:657 / 667
页数:10
相关论文
共 50 条
  • [1] Principal component analysis with missing values: a comparative survey of methods
    Dray, Stephane
    Josse, Julie
    PLANT ECOLOGY, 2015, 216 (05) : 657 - 667
  • [2] Missing values in principal component analysis
    Grung, B
    Manne, R
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1998, 42 (1-2) : 125 - 139
  • [3] Handling missing values in Principal Component Analysis
    Josse, Julie
    Husson, Francois
    Pages, Jerome
    JOURNAL OF THE SFDS, 2009, 150 (02): : 28 - 51
  • [5] Robust Principal Component Analysis of Data with Missing Values
    Karkkainen, Tommi
    Saarela, Mirka
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2015, 2015, 9166 : 140 - 154
  • [6] Principal Component Analysis of Process Datasets with Missing Values
    Severson, Kristen A.
    Molaro, Mark C.
    Braatz, Richard D.
    PROCESSES, 2017, 5 (03)
  • [7] A principal component method to impute missing values for mixed data
    Vincent Audigier
    François Husson
    Julie Josse
    Advances in Data Analysis and Classification, 2016, 10 : 5 - 26
  • [8] A principal component method to impute missing values for mixed data
    Audigier, Vincent
    Husson, Francois
    Josse, Julie
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2016, 10 (01) : 5 - 26
  • [9] SPARSE PRINCIPAL COMPONENT ANALYSIS WITH MISSING OBSERVATIONS
    Park, Seyoung
    Zhao, Hongyu
    ANNALS OF APPLIED STATISTICS, 2019, 13 (02) : 1016 - 1042
  • [10] Comparisons among several methods for handling missing data in principal component analysis (PCA)
    Loisel, Sebastien
    Takane, Yoshio
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (02) : 495 - 518