Outlier detection in large data sets

被引:54
作者
Buzzi-Ferraris, Guido [1 ]
Manenti, Flavio [1 ]
机构
[1] Politecn Milan, Dipartimento Chim Mat & Ingn Chim Giulio Natta, I-20133 Milan, Italy
关键词
Outliers; Reliable parameter estimation; Robustness; Large data sets;
D O I
10.1016/j.compchemeng.2010.11.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we propose a method for correctly detecting outliers based on a new technique developed to simultaneously evaluate mean, variance and outliers. This method is capable of self-regulating its robustness to suit the experimental data set under analysis, so as to overcome shortcomings of: (i) non-robust methods such as the least sum of squares; (ii) the need of the user in defining a trimmed sub-set of experimental points such as in least trimmed sum of squares; and (iii) the possibility to read the data set only once to evaluate the mean, variance, and outliers of a population by preserving robustness. (C) 2010 Published by Elsevier Ltd.
引用
收藏
页码:388 / 390
页数:3
相关论文
共 7 条
[1]  
[Anonymous], 1998, Applied Regression Analysis
[2]  
[Anonymous], 2010, Interpolation and regression models for the chemical engineer: solving numerical problems
[3]  
Rousseeuw P. J., 1987, ROBUST REGRESSION OU
[4]   Computing LTS regression for large data sets [J].
Rousseeuw, PJ ;
Van Driessen, K .
DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 12 (01) :29-45
[5]   LEAST MEDIAN OF SQUARES REGRESSION [J].
ROUSSEEUW, PJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1984, 79 (388) :871-880
[6]  
Ryan T., 2009, Modern regression methods
[7]  
Seber GAF, 2003, Nonlinear Regression