Data-driven cluster analysis method: a novel outliers detection method in multivariate data

被引:0
作者
Duarte, A. R. [1 ]
Barbosa, J. J. [1 ]
Martins, H. S. R. [1 ]
Oliveira, F. L. P. [1 ]
机构
[1] Univ Fed Ouro Preto, Stat Dept, Ouro Preto, Brazil
关键词
Data-driven; Multivariate outliers; Cluster analysis; Bayesian information criterion; Accuracy; MAHALANOBIS DISTANCE; IDENTIFICATION;
D O I
10.1080/03610918.2024.2376872
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Detection of multivariate outliers is crucial in statistical studies. On the other hand, the statistical applications without identifying possible outliers may present incorrect results. This study proposes a new technique for detecting multivariate outliers based on cluster analysis. The method considers information inherent in the data itself. We compare the methodology with three detection methods that are already widespread. The comparative investigation considers detection techniques based on the Mahalanobis distance. Sensitivity, specificity, and accuracy measures are used to assess the quality of the methods, as well as an analysis of the CPU time required to carry out the procedures. The new technique revealed a notorious superiority over others.
引用
收藏
页数:21
相关论文
共 48 条
[1]  
Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P1
[2]   A Novel Outlier Detection Method for Multivariate Data [J].
Almardeny, Yahya ;
Boujnah, Noureddine ;
Cleary, Frances .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (09) :4052-4062
[3]  
Barbosa J. J., 2020, Cincia Natura, V42, P1
[4]  
Barbosa J. J., 2018, Cincia Natura, V40, P1
[5]  
BARNETT V, 1994, OUTLIERS STAT DATA
[6]   Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators [J].
Cabana, Elisa ;
Lillo, Rosa E. ;
Laniado, Henry .
STATISTICAL PAPERS, 2021, 62 (04) :1583-1609
[8]   Controlling the size of multivariate outlier tests with the MCD estimator of scatter [J].
Cerioli, Andrea ;
Riani, Marco ;
Atkinson, Anthony C. .
STATISTICS AND COMPUTING, 2009, 19 (03) :341-353
[9]   Principal components in the discrimination of outliers: A study in simulation sample data corrected by Pearson's and Yates's chisquare distance [J].
de Souza Veloso, Manoel Vitor ;
Cirillo, Marcelo Angelo .
ACTA SCIENTIARUM-TECHNOLOGY, 2016, 38 (02) :193-200
[10]  
Deng S., 2019, OJS, V9, P15, DOI DOI 10.4236/OJS.2019.91002