Diagnosing Multivariate Outliers Detected by Robust Estimators

被引:15
作者
Willems, Gert [1 ]
Joe, Harry [2 ]
Zamar, Ruben [2 ]
机构
[1] Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium
[2] Univ British Columbia, Dept Stat, Vancouver, BC V6T 1Z2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Outlier diagnostics; Robust distances; Visualization of multivariate data; XGOBI;
D O I
10.1198/jcgs.2009.0005
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a number of diagnostic methods that can be used whenever multiple outliers are identified by robust estimates for multivariate location and scatter. Their main purpose is visualization of the multivariate data to help determine whether the detected outliers (a) form separate clusters or (b) are isolated or randomly scattered (such as heavy tails compared with Gaussian). We make use of Mahalanobis distances and linear projections, to check for separation and to reveal additional aspects of the data structure. Several real data examples are analyzed, and artificial examples are used to illustrate the diagnostic power of the proposed plots. Code to perform the diagnostics, datasets used as examples in the article and documention are available in the online supplements.
引用
收藏
页码:73 / 91
页数:19
相关论文
共 21 条
[1]  
Atkinson A.C., 2004, SPR S STAT
[2]  
COOK D., 1995, J COMPUTATIONAL GRAP, V4, P155, DOI [DOI 10.1080/10618600.1995.10474674, 10.1080/10618600.1995.10474674]
[3]  
Flury B, 1988, MULTIVARIATE STAT PR
[4]   How many clusters? Which clustering method? Answers via model-based cluster analysis [J].
Fraley, C ;
Raftery, AE .
COMPUTER JOURNAL, 1998, 41 (08) :578-588
[5]   A robust method for cluster analysis [J].
Gallegos, MT ;
Ritter, G .
ANNALS OF STATISTICS, 2005, 33 (01) :347-380
[6]   The distribution of robust distances [J].
Hardin, J ;
Rocke, DM .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2005, 14 (04) :928-946
[7]   Asymmetric linear dimension reduction for classification [J].
Hennig, C .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2004, 13 (04) :930-945
[8]  
Hennig C, 2005, ST CLASS DAT ANAL, P47
[9]  
HENNIG C, 2005, FPC FIXED POINT CLUS
[10]  
Hettich S., 1999, The UCI KDD Archive