Statistical Method for Finding Outliers in Multivariate Data using a Boxplot and Multiple Linear Regression

被引:2
作者
Thanwiset, Theeraphat [1 ]
Srisodaphol, Wuttichai [1 ]
机构
[1] Khon Kaen Univ, Dept Stat, Khon Kaen 40002, Thailand
来源
SAINS MALAYSIANA | 2023年 / 52卷 / 09期
关键词
Boxplot; multivariate data; multiple linear regression; outlier;
D O I
10.17576/jsm-2023-5209-20
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The objective of this study was to propose a method for detecting outliers in multivariate data. It is based on a boxplot and multiple linear regression. In our proposed method, the box plot was initially applied to filter the data across all variables to split the data set into two sets: normal data (belonging to the upper and lower fences of the boxplot) and data that could be outliers. The normal data was then used to construct a multiple linear regression model and find the maximum error of the residual to denote the cut-off point. For the performance evaluation of the proposed method, a simulation study for multivariate normal data with and without contaminated data was conducted at various levels. The previous methods were compared with the performance of the proposed methods, namely, the Mahalanobis distance and Mahalanobis distance with the robust estimators using the minimum volume ellipsoid method, the minimum covariance determinant method, and the minimum vector variance method. The results showed that the proposed method had the best performance over other methods that were compared for all the contaminated levels. It was also found that when the proposed method was used with real data, it was able to find outlier values that were in line with the real data.
引用
收藏
页码:2725 / 2732
页数:8
相关论文
共 50 条
[31]   Weibull and lognormal Taguchi analysis using multiple linear regression [J].
Pina-Monarrez, Manuel R. ;
Ortiz-Yanez, Jesus F. .
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2015, 144 :244-253
[32]   Unbiased Weibull capabilities indices using multiple linear regression [J].
Pina-Monarrez, Manuel R. ;
Baro-Tijerina, Manuel ;
Ortiz-Yancz, Jesus F. .
QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2017, 33 (08) :1915-1920
[33]   Sample sizes when using multiple linear regression for prediction [J].
Knofczynski, Gregory T. ;
Mundfrom, Daniel .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2008, 68 (03) :431-442
[34]   Conditional Weibull Control Charts Using Multiple Linear Regression [J].
Pina-Monarrez, Manuel R. .
QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2017, 33 (04) :785-791
[35]   Short term load forecasting using multiple linear regression [J].
Amral, N. ;
Oezveren, C. S. ;
King, D. .
2007 42ND INTERNATIONAL UNIVERSITIES POWER ENGINEERING CONFERENCE, VOLS 1-3, 2007, :1192-1198
[36]   Efficient Correlation Method for Satellite Thermal Analysis Model Using Multiple Linear Regression and Optimization Algorithms [J].
Jaewon Kang ;
Keon Woong Kim ;
Somin Shin ;
Jeong Ho Kim .
International Journal of Aeronautical and Space Sciences, 2023, 24 :1257-1270
[37]   A method to predict solar photovoltaic soiling using artificial neural networks and multiple linear regression models [J].
Kudzanayi Chiteka ;
Rajesh Arora ;
S. N. Sridhara .
Energy Systems, 2020, 11 :981-1002
[38]   A method to predict solar photovoltaic soiling using artificial neural networks and multiple linear regression models [J].
Chiteka, Kudzanayi ;
Arora, Rajesh ;
Sridhara, S. N. .
ENERGY SYSTEMS-OPTIMIZATION MODELING SIMULATION AND ECONOMIC ASPECTS, 2020, 11 (04) :981-1002
[39]   A METHOD TO IMPROVE THE GCC SERIES OF PHENOLOGY CAMERAS BASED ON HISTOGRAM FEATURES USING MULTIPLE LINEAR REGRESSION [J].
Li, Qing ;
Chen, Xuehong ;
Chen, Jin .
2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, :6606-+
[40]   Efficient Correlation Method for Satellite Thermal Analysis Model Using Multiple Linear Regression and Optimization Algorithms [J].
Kang, Jaewon ;
Kim, Keon Woong ;
Shin, Somin ;
Kim, Jeong Ho .
INTERNATIONAL JOURNAL OF AERONAUTICAL AND SPACE SCIENCES, 2023, 24 (05) :1257-1270