Statistical Method for Finding Outliers in Multivariate Data using a Boxplot and Multiple Linear Regression

被引:1
作者
Thanwiset, Theeraphat [1 ]
Srisodaphol, Wuttichai [1 ]
机构
[1] Khon Kaen Univ, Dept Stat, Khon Kaen 40002, Thailand
来源
SAINS MALAYSIANA | 2023年 / 52卷 / 09期
关键词
Boxplot; multivariate data; multiple linear regression; outlier;
D O I
10.17576/jsm-2023-5209-20
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The objective of this study was to propose a method for detecting outliers in multivariate data. It is based on a boxplot and multiple linear regression. In our proposed method, the box plot was initially applied to filter the data across all variables to split the data set into two sets: normal data (belonging to the upper and lower fences of the boxplot) and data that could be outliers. The normal data was then used to construct a multiple linear regression model and find the maximum error of the residual to denote the cut-off point. For the performance evaluation of the proposed method, a simulation study for multivariate normal data with and without contaminated data was conducted at various levels. The previous methods were compared with the performance of the proposed methods, namely, the Mahalanobis distance and Mahalanobis distance with the robust estimators using the minimum volume ellipsoid method, the minimum covariance determinant method, and the minimum vector variance method. The results showed that the proposed method had the best performance over other methods that were compared for all the contaminated levels. It was also found that when the proposed method was used with real data, it was able to find outlier values that were in line with the real data.
引用
收藏
页码:2725 / 2732
页数:8
相关论文
共 50 条
  • [1] Detection of Outliers Method in Grouped Multivariate Data: A Method Based on Multiple Linear Regression
    Phuttisen, Suthat
    Srisodaphol, Wuttichai
    PAKISTAN JOURNAL OF STATISTICS AND OPERATION RESEARCH, 2024, 20 (03) : 445 - 453
  • [2] Robust detection of multiple outliers in grouped multivariate data
    Caroni, Chrys
    Billor, Nedret
    JOURNAL OF APPLIED STATISTICS, 2007, 34 (10) : 1241 - 1250
  • [3] Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
    He, Yong
    Cui, Zhongmin
    Fang, Yu
    Chen, Hanwei
    APPLIED PSYCHOLOGICAL MEASUREMENT, 2013, 37 (07) : 522 - 540
  • [4] Composite Imputation Method for the Multiple Linear Regression with Missing at Random Data
    Thongsri, Thidarat
    Samart, Klairung
    INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE, 2022, 17 (01) : 51 - 62
  • [5] Financial Data Quality Evaluation Method Based on Multiple Linear Regression
    Li, Meng
    Liu, Jiqiang
    Yang, Yeping
    FUTURE INTERNET, 2023, 15 (10)
  • [6] Statistical Learning and Multiple Linear Regression Model for Network Selection using MIH
    Rahil, Ahmad
    Mbarek, Nader
    Togni, Olivier
    Atieh, Mirna
    Fouladkar, Ali
    2014 THIRD INTERNATIONAL CONFERENCE ON E-TECHNOLOGIES AND NETWORKS FOR DEVELOPMENT (ICEND), 2014,
  • [7] Predictive Big Data Analytics Using Multiple Linear Regression Model
    Khine, Kyi Lai Lai
    Nyunt, Thi Thi Soe
    BIG DATA ANALYSIS AND DEEP LEARNING APPLICATIONS, 2019, 744 : 9 - 19
  • [8] Identification and classification of multiple outliers, high leverage points and influential observations in linear regression
    Nurunnabi, A. A. M.
    Nasser, M.
    Imon, A. H. M. R.
    JOURNAL OF APPLIED STATISTICS, 2016, 43 (03) : 509 - 525
  • [9] Multiple linear regression modeling for compositional data
    Wang, Huiwen
    Shangguan, Liying
    Wu, Junjie
    Guan, Rong
    NEUROCOMPUTING, 2013, 122 : 490 - 500
  • [10] Robust estimation in linear regression models for longitudinal data with covariate measurement errors and outliers
    Zhang, Yuexia
    Qin, Guoyou
    Zhu, Zhongyi
    Zhang, Jiajia
    JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 168 : 261 - 275