Outlier detection for high dimensional data using the Comedian approach

被引:27
作者
Sajesh, T. A. [1 ]
Srinivasan, M. R. [1 ]
机构
[1] Univ Madras, Dept Stat, Madras 600005, Tamil Nadu, India
关键词
outlier detection; breakdown value; Mahalanobis distance; robust statistics; MULTIVARIATE LOCATION; MULTIPLE OUTLIERS; BREAKDOWN POINTS; MATRICES; DISPERSION; ESTIMATORS; SCATTER;
D O I
10.1080/00949655.2011.552504
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The process of detection of outliers is an interesting and important aspect in the analysis of data, as it could impact the inference. There are various methods available in the literature for detection of outliers in multivariate data [V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester, 1994] using the Mahalanobis distance measure. An attempt is made to propose an alternate method of outlier detection based on the comedian introduced by Falk [On MAD and Comedians, Ann. Inst. Statist. Math. 49 (1997), pp. 615-644]. The proposed method is computationally efficient with high breakdown value and low computation time. Further, important properties, namely, success rates (SR) and false detection rates (FDR) are studied and compared with some of the well-known outlier detection methods through a simulation study. The Comedian method has high SR and low FDR for all combination of parameters. On removal of the detected outliers or down weighing, the same, highly robust and approximately affine equivariant estimators of multivariate location and scatter can be obtained. Finally, the method is applied to well-known real data sets to evaluate its performance.
引用
收藏
页码:745 / 757
页数:13
相关论文
共 24 条
[1]  
[Anonymous], 1994, Wiley series in probability and mathematical statistics applied probability and statistics
[2]   FAST VERY ROBUST METHODS FOR THE DETECTION OF MULTIPLE OUTLIERS [J].
ATKINSON, AC .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (428) :1329-1339
[3]  
Brownlee K. A., 1965, STAT THEORY METHODOL
[4]  
Campbell N. A., 1980, Applied Statistics, V29, P231, DOI 10.2307/2346896
[5]   Unified scheme for testing for outliers in linear models [J].
Childs, A ;
Balakrishnan, N ;
Srinivasan, MR .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2006, 76 (01) :21-39
[6]  
Croux C., 1992, COMPUTATIONAL STATIS, V1, P411, DOI DOI 10.1007/978-3-662-26811-7_58
[7]  
Daudin J.J., 1988, STATISTICS, V19, P214
[8]   ROBUST ESTIMATION OF DISPERSION MATRICES AND PRINCIPAL COMPONENTS [J].
DEVLIN, SJ ;
GNANADESIKAN, R ;
KETTENRING, JR .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (374) :354-362
[9]  
Donoho D.L., 1982, Technical report
[10]   On MAD and comedians [J].
Falk, M .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1997, 49 (04) :615-644