Outlier detection for high dimensional data using the Comedian approach

被引:27
作者
Sajesh, T. A. [1 ]
Srinivasan, M. R. [1 ]
机构
[1] Univ Madras, Dept Stat, Madras 600005, Tamil Nadu, India
关键词
outlier detection; breakdown value; Mahalanobis distance; robust statistics; MULTIVARIATE LOCATION; MULTIPLE OUTLIERS; BREAKDOWN POINTS; MATRICES; DISPERSION; ESTIMATORS; SCATTER;
D O I
10.1080/00949655.2011.552504
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The process of detection of outliers is an interesting and important aspect in the analysis of data, as it could impact the inference. There are various methods available in the literature for detection of outliers in multivariate data [V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester, 1994] using the Mahalanobis distance measure. An attempt is made to propose an alternate method of outlier detection based on the comedian introduced by Falk [On MAD and Comedians, Ann. Inst. Statist. Math. 49 (1997), pp. 615-644]. The proposed method is computationally efficient with high breakdown value and low computation time. Further, important properties, namely, success rates (SR) and false detection rates (FDR) are studied and compared with some of the well-known outlier detection methods through a simulation study. The Comedian method has high SR and low FDR for all combination of parameters. On removal of the detected outliers or down weighing, the same, highly robust and approximately affine equivariant estimators of multivariate location and scatter can be obtained. Finally, the method is applied to well-known real data sets to evaluate its performance.
引用
收藏
页码:745 / 757
页数:13
相关论文
共 24 条
[21]   LEAST MEDIAN OF SQUARES REGRESSION [J].
ROUSSEEUW, PJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1984, 79 (388) :871-880
[22]   TRANSFORMATION OF NONPOSITIVE SEMIDEFINITE CORRELATION-MATRICES [J].
ROUSSEEUW, PJ ;
MOLENBERGHS, G .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1993, 22 (04) :965-984
[23]  
Stahel W.A., 1981, Breakdown of covariance estimators