Outlier detection for skewed data

被引:193
作者
Hubert, Mia [1 ]
Van der Veeken, Stephan [1 ]
机构
[1] Katholieke Univ Leuven, Dept Math, LSTAT, B-3001 Louvain, Belgium
关键词
outlier detection; boxplot; bagplot; skewness; outlyingness;
D O I
10.1002/cem.1123
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of the underlying distribution. We propose an outlier detection method which does not need the assumption of symmetry and does not rely on visual inspection. Our method is a generalization of the Stahel-Donoho outlyingness. The latter approach assigns to each observation a measure of outlyingness, which is obtained by projection pursuit techniques that only use univariate robust measures of location and scale. To allow skewness in the data, we adjust this measure of outlyingness by using a robust measure of skewness as well. The observations corresponding to an outlying value of the adjusted outlyingness (AO) are then considered as outliers. For bivariate data, our approach leads to two graphical representations. The first one is a contour plot of the AO values. We also construct an extension of the boxplot for bivariate data, in the spirit of the bagplot [1] which is based on the concept of half space depth. We illustrate our outlier detection method on several simulated and real data. Copyright (c) 2008 John Wiley & Sons, Ltd.
引用
收藏
页码:235 / 246
页数:12
相关论文
共 21 条
[1]   The multivariate skew-normal distribution [J].
Azzalini, A ;
DallaValle, A .
BIOMETRIKA, 1996, 83 (04) :715-726
[2]   A robustification of independent component analysis [J].
Brys, G ;
Hubert, M ;
Rousseeuw, PJ .
JOURNAL OF CHEMOMETRICS, 2005, 19 (5-7) :364-375
[3]   A robust measure of skewness [J].
Brys, G ;
Hubert, M ;
Struyf, A .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2004, 13 (04) :996-1017
[5]   On describing multivariate skewed distributions: a directional approach [J].
Ferreira, Jose T. A. S. ;
Steel, Mark F. J. .
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2006, 34 (03) :411-429
[6]   The influence function of the Stahel-Donoho estimator of multivariate location and scatter [J].
Gervini, D .
STATISTICS & PROBABILITY LETTERS, 2002, 60 (04) :425-435
[7]   ROBPCA: A new approach to robust principal component analysis [J].
Hubert, M ;
Rousseeuw, PJ ;
Vanden Branden, K .
TECHNOMETRICS, 2005, 47 (01) :64-79
[8]   The truncated mean of an asymmetric distribution [J].
Marazzi, A ;
Ruffieux, C .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1999, 32 (01) :79-100
[9]   Weighted likelihood equations with bootstrap root search [J].
Markatou, M ;
Basu, A ;
Lindsay, BG .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (442) :740-750
[10]   THE BEHAVIOR OF THE STAHEL-DONOHO ROBUST MULTIVARIATE ESTIMATOR [J].
MARONNA, RA ;
YOHAI, VJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (429) :330-341