Data-driven cluster analysis method: a novel outliers detection method in multivariate data

被引:0
作者
Duarte, A. R. [1 ]
Barbosa, J. J. [1 ]
Martins, H. S. R. [1 ]
Oliveira, F. L. P. [1 ]
机构
[1] Univ Fed Ouro Preto, Stat Dept, Ouro Preto, Brazil
关键词
Data-driven; Multivariate outliers; Cluster analysis; Bayesian information criterion; Accuracy; MAHALANOBIS DISTANCE; IDENTIFICATION;
D O I
10.1080/03610918.2024.2376872
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Detection of multivariate outliers is crucial in statistical studies. On the other hand, the statistical applications without identifying possible outliers may present incorrect results. This study proposes a new technique for detecting multivariate outliers based on cluster analysis. The method considers information inherent in the data itself. We compare the methodology with three detection methods that are already widespread. The comparative investigation considers detection techniques based on the Mahalanobis distance. Sensitivity, specificity, and accuracy measures are used to assess the quality of the methods, as well as an analysis of the CPU time required to carry out the procedures. The new technique revealed a notorious superiority over others.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] A MODIFICATION OF A METHOD FOR THE DETECTION OF OUTLIERS IN MULTIVARIATE SAMPLES
    HADI, AS
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1994, 56 (02): : 393 - 396
  • [22] A Novel Data-Driven Modeling and Control Design Method for Autonomous Vehicles
    Fenyes, Daniel
    Nemeth, Balazs
    Gaspar, Peter
    ENERGIES, 2021, 14 (02)
  • [23] Data-driven multivariate identification of gyrification patterns in a transdiagnostic patient cohort: A cluster analysis approach
    Pfarr, Julia-Katharina
    Meller, Tina
    Brosch, Katharina
    Stein, Frederike
    Thomas-Odenthal, Florian
    Evermann, Ulrika
    Wroblewski, Adrian
    Ringwald, Kai G.
    Hahn, Tim
    Meinert, Susanne
    Winter, Alexandra
    Thiel, Katharina
    Flinkenfluegel, Kira
    Jansen, Andreas
    Krug, Axel
    Dannlowski, Udo
    Kircher, Tilo
    Gaser, Christian
    Nenadic, Igor
    NEUROIMAGE, 2023, 281
  • [24] Data-driven modeling method with reverse process
    Yi, Guodong
    Yi, Lifan
    Zhang, Zaizhao
    Li, Chuihui
    INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING, 2022, 13 (02)
  • [25] Data-driven projection method in fluid simulation
    Yang, Cheng
    Yang, Xubo
    Xiao, Xiangyun
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2016, 27 (3-4) : 415 - 424
  • [26] A data-driven method of health monitoring for spacecraft
    Kang, Xu
    Pi, Dechang
    AIRCRAFT ENGINEERING AND AEROSPACE TECHNOLOGY, 2018, 90 (02) : 435 - 451
  • [27] A data-driven method for pipeline scheduling optimization
    Liao, Qi
    Zhang, Haoran
    Xia, Tianqi
    Chen, Quanjun
    Li, Zhengbing
    Liang, Yongtu
    CHEMICAL ENGINEERING RESEARCH & DESIGN, 2019, 144 : 79 - 94
  • [28] A data-driven method for modeling pronunciation variation
    Kessens, JM
    Cucchiarini, C
    Strik, H
    SPEECH COMMUNICATION, 2003, 40 (04) : 517 - 534
  • [29] A Data-Driven Robust Fault Detection Method for Linear Systems with Full-Order Sensors
    Li, Zhe
    Liu, Kexin
    Li, Yuan-Xin
    Wang, Yaonan
    Liu, Li
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (10) : 5428 - 5443
  • [30] A data-driven method for operation pattern analysis of the integrated energy microgrid
    Zheng, Liqin
    Li, Yunyi
    Wei, Chun
    Bai, Xiaoqinq
    ENERGY CONVERSION AND MANAGEMENT-X, 2021, 11