Detection of outliers

被引:94
作者
Hadi, Ali S. [1 ,2 ]
Imon, A. H. M. Rahmatullah [3 ]
Werner, Mark [1 ]
机构
[1] Amer Univ Cairo, Dept Math & Actuarial Sci, Cairo, Egypt
[2] Cornell Univ, Dept Stat Sci, Ithaca, NY USA
[3] Ball State Univ, Dept Math Sci, Muncie, IN USA
关键词
data mining; density-based outliers; distance-based outliers; Mahalanobis distance; identification of outliers;
D O I
10.1002/wics.6
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We present an overview of the major developments in the area of detection of outliers. These include projection pursuit approaches as well as Mahalanobis distance-based procedures. We also discuss principal component-based methods, since these are most applicable to the large datasets that have become more prevalent in recent years. The major algorithms within each category are briefly discussed, together with current challenges and possible directions of future research. © 2009 John Wiley & Sons, Inc.
引用
收藏
页码:57 / 70
页数:14
相关论文
共 77 条
[1]   An effective and efficient algorithm for high-dimensional outlier detection [J].
Aggarwal, CC ;
Yu, PS .
VLDB JOURNAL, 2005, 14 (02) :211-221
[2]  
Agyemang M, 2004, INNOVATIONS THROUGH INFORMATION TECHNOLOGY, VOLS 1 AND 2, P5
[3]   WCOND-mine: Algorithm for detecting web content outliers from web documents [J].
Agyemang, M ;
Barker, K ;
Alhajj, RS .
10TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 2005, :885-890
[4]   Distance-based detection and prediction of outliers [J].
Angiulli, F ;
Basta, S ;
Pizzuti, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) :145-160
[5]   Outlier mining in large high-dimensional data sets [J].
Angiulli, F ;
Pizzuti, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) :203-215
[6]  
Bacon F., 1620, NOVUS ORGANUM
[7]   Outlier detection and estimation in nonlinear time series [J].
Battaglia, F ;
Orfei, L .
JOURNAL OF TIME SERIES ANALYSIS, 2005, 26 (01) :107-121
[8]   Outlier detection in regression models with ARIMA errors using robust estimates [J].
Bianco, AM ;
Ben, MG ;
Martínez, EJ ;
Yohai, VJ .
JOURNAL OF FORECASTING, 2001, 20 (08) :565-579
[9]   BACON: blocked adaptive computationally efficient outlier nominators [J].
Billor, N ;
Hadi, AS ;
Velleman, PF .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2000, 34 (03) :279-298
[10]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104