Principal component analysis with missing values: a comparative survey of methods

被引:0
|
作者
Stéphane Dray
Julie Josse
机构
[1] Université de Lyon,Applied Mathematics Department
[2] Université Lyon 1,undefined
[3] CNRS,undefined
[4] UMR5558,undefined
[5] Laboratoire de Biométrie et Biologie Evolutive,undefined
[6] Agrocampus Ouest,undefined
来源
Plant Ecology | 2015年 / 216卷
关键词
Imputation; Ordination; PCA; Traits;
D O I
暂无
中图分类号
学科分类号
摘要
Principal component analysis (PCA) is a standard technique to summarize the main structures of a data table containing the measurements of several quantitative variables for a number of individuals. Here, we study the case where some of the data values are missing and propose a review of methods which accommodate PCA to missing data. In plant ecology, this statistical challenge relates to the current effort to compile global plant functional trait databases producing matrices with a large amount of missing values. We present several techniques to consider or estimate (impute) missing values in PCA and compare them using theoretical considerations. We carried out a simulation study to evaluate the relative merits of the different approaches in various situations (correlation structure, number of variables and individuals, and percentage of missing values) and also applied them on a real data set. Lastly, we discuss the advantages and drawbacks of these approaches, the potential pitfalls and future challenges that need to be addressed in the future.
引用
收藏
页码:657 / 667
页数:10
相关论文
共 50 条
  • [41] Data Analysis Using Principal Component Analysis
    Sehgal, Shrub
    Singh, Harpreet
    Agarwal, Mohit
    Bhasker, V.
    Shantanu
    2014 INTERNATIONAL CONFERENCE ON MEDICAL IMAGING, M-HEALTH & EMERGING COMMUNICATION SYSTEMS (MEDCOM), 2015, : 45 - 48
  • [42] INCREMENTAL PRINCIPAL COMPONENT ANALYSIS BASED OUTLIER DETECTION METHODS FOR SPATIOTEMPORAL DATA STREAMS
    Bhushan, Alka
    Sharker, Monir H.
    Karimi, Hassan A.
    ISPRS INTERNATIONAL WORKSHOP ON SPATIOTEMPORAL COMPUTING, 2015, : 67 - 71
  • [43] Principal component analysis for interval data
    Billard, L.
    Le-Rademacher, J.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2012, 4 (06): : 535 - 540
  • [44] Efficient fair principal component analysis
    Mohammad Mahdi Kamani
    Farzin Haddadpour
    Rana Forsati
    Mehrdad Mahdavi
    Machine Learning, 2022, 111 : 3671 - 3702
  • [45] Streaming Sparse Principal Component Analysis
    Yang, Wenzhuo
    Xu, Huan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 494 - 503
  • [46] Sparse Generalised Principal Component Analysis
    Smallman, Luke
    Artemiou, Andreas
    Morgan, Jennifer
    PATTERN RECOGNITION, 2018, 83 : 443 - 455
  • [47] Principal component analysis in the wavelet domain
    Lim, Yaeji
    Kwon, Junhyeon
    Oh, Hee-Seok
    PATTERN RECOGNITION, 2021, 119
  • [48] Double robust principal component analysis
    Wang, Qianqian
    Gao, QuanXue
    Sun, Gan
    Ding, Chris
    NEUROCOMPUTING, 2020, 391 : 119 - 128
  • [49] Performance comparison of genetic algorithm and principal component analysis methods for ECG signal extraction
    Balambigai, S.
    Asokan, R.
    INTERNATIONAL JOURNAL OF HEALTHCARE TECHNOLOGY AND MANAGEMENT, 2011, 12 (5-6) : 379 - 389
  • [50] The Effectiveness of a Probabilistic Principal Component Analysis Model and Expectation Maximisation Algorithm in Treating Missing Daily Rainfall Data
    Chuan, Zun Liang
    Deni, Sayang Mohd
    Fam, Soo-Fen
    Ismail, Noriszura
    ASIA-PACIFIC JOURNAL OF ATMOSPHERIC SCIENCES, 2020, 56 (01) : 119 - 129