MacroPCA: An All-in-One PCA Method Allowing for Missing Values as Well as Cellwise and Rowwise Outliers

被引:31
作者
Hubert, Mia [1 ]
Rousseeuw, Peter J. [1 ]
Van den Bossche, Wannes [1 ]
机构
[1] Katholieke Univ Leuven, Dept Math, BE-3001 Leuven, Belgium
基金
欧盟地平线“2020”;
关键词
Detecting deviating cells; Outlier map; Principal component analysis; Residual map; Robust estimation; PRINCIPAL COMPONENT ANALYSIS; MULTIVARIATE LOCATION; ROBUST ESTIMATION; COVARIANCE; ESTIMATORS; REGRESSION; SCATTER; PLS;
D O I
10.1080/00401706.2018.1562989
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Multivariate data are typically represented by a rectangular matrix (table) in which the rows are the objects (cases) and the columns are the variables (measurements). When there are many variables one often reduces the dimension by principal component analysis (PCA), which in its basic form is not robust to outliers. Much research has focused on handling rowwise outliers, that is, rows that deviate from the majority of the rows in the data (e.g., they might belong to a different population). In recent years also cellwise outliers are receiving attention. These are suspicious cells (entries) that can occur anywhere in the table. Even a relatively small proportion of outlying cells can contaminate over half the rows, which causes rowwise robust methods to break down. In this article, a new PCA method is constructed which combines the strengths of two existing robust methods to be robust against both cellwise and rowwise outliers. At the same time, the algorithm can cope with missing values. As of yet it is the only PCA method that can deal with all three problems simultaneously. Its name MacroPCA stands for PCA allowing for Missingness And Cellwise & Rowwise Outliers. Several simulations and real datasets illustrate its robustness. New residual maps are introduced, which help to determine which variables are responsible for the outlying behavior. The method is well-suited for online process control.
引用
收藏
页码:459 / 473
页数:15
相关论文
共 28 条
[1]   Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination [J].
Agostinelli, Claudio ;
Leung, Andy ;
Yohai, Victor J. ;
Zamar, Ruben H. .
TEST, 2015, 24 (03) :441-461
[2]  
Alfons A., 2016, robustHD: Robust Methods for High-Dimensional Data
[3]   PROPAGATION OF OUTLIERS IN MULTIVARIATE DATA [J].
Alqallaf, Fatemah ;
Van Aelst, Stefan ;
Yohai, Victor J. ;
Zamar, Ruben H. .
ANNALS OF STATISTICS, 2009, 37 (01) :311-331
[4]   High-breakdown estimation of multivariate mean and covariance with missing observations [J].
Cheng, TC ;
Victoria-Feser, MP .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2002, 55 :317-335
[5]   High breakdown estimators for principal components: the projection-pursuit approach revisited [J].
Croux, C ;
Ruiz-Gazen, A .
JOURNAL OF MULTIVARIATE ANALYSIS, 2005, 95 (01) :206-226
[6]   Robust Estimation of Multivariate Location and Scatter in the Presence of Missing Data [J].
Danilov, Mike ;
Yohai, Victor J. ;
Zamar, Ruben H. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2012, 107 (499) :1178-1186
[7]  
Engelen S, 2005, AUST J STAT, V34, P117
[8]   PCA model building with missing data: New proposals and a comparative study [J].
Folch-Fortuny, A. ;
Arteaga, F. ;
Ferrer, A. .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 146 :77-88
[9]   ROBPCA: A new approach to robust principal component analysis [J].
Hubert, M ;
Rousseeuw, PJ ;
Vanden Branden, K .
TECHNOMETRICS, 2005, 47 (01) :64-79
[10]   A fast method for robust principal components with applications to chemometrics [J].
Hubert, M ;
Rousseeuw, PJ ;
Verboven, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 60 (1-2) :101-111