RODD: Robust Outlier Detection in Data Cubes

被引:1
作者
Kuhlmann, Lara [2 ,3 ]
Wilmes, Daniel [1 ]
Mueller, Emmanuel [1 ,4 ]
Pauly, Markus [2 ,4 ]
Horn, Daniel [2 ,4 ]
机构
[1] TU Dortmund Univ, Dept Comp Sci, Dortmund, Germany
[2] TU Dortmund Univ, Dept Stat, Dortmund, Germany
[3] TU Dortmund Univ, Grad Sch Logist, Dept Mech Engn, Dortmund, Germany
[4] TU Dortmund Univ, Res Ctr Trustworthy Data Sci & Secur, Dortmund, Germany
来源
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2023 | 2023年 / 14148卷
关键词
Outlier Detection; Data Cubes; Categorical Data; Random Forest;
D O I
10.1007/978-3-031-39831-5_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data cubes are multidimensional databases, often built from several separate databases, that serve as flexible basis for data analysis. Surprisingly, outlier detection on data cubes has not yet been treated extensively. In this work, we provide the first framework to evaluate robust outlier detection methods in data cubes (RODD). We introduce a novel random forest-based outlier detection approach (RODD-RF) and compare it with more traditional methods based on robust location estimators. We propose a general type of test data and examine all methods in a simulation study. Moreover, we apply ROOD-RF to real-world data. The results show that RODD-RF leads to improved outlier detection.
引用
收藏
页码:325 / 339
页数:15
相关论文
共 40 条
  • [1] Andrews J.T.A., 2016, Int. J. Mach. Learn. Comput., V6, P21, DOI 10.18178/ijmlc.2016. 6.1.565
  • [2] Advances in Machine Learning Modeling Reviewing Hybrid and Ensemble Methods
    Ardabili, Sina
    Mosavi, Amir
    Varkonyi-Koczy, Annamaria R.
    [J]. ENGINEERING FOR SUSTAINABLE FUTURE, 2020, 101 : 215 - 227
  • [3] Breiman L., 2001, Machine Learning, V45, P5
  • [4] LOF: Identifying density-based local outliers
    Breunig, MM
    Kriegel, HP
    Ng, RT
    Sander, J
    [J]. SIGMOD RECORD, 2000, 29 (02) : 93 - 104
  • [5] On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
    Campos, Guilherme O.
    Zimek, Arthur
    Sander, Jorg
    Campello, Ricardo J. G. B.
    Micenkova, Barbora
    Schubert, Erich
    Assent, Ira
    Houle, Michael E.
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (04) : 891 - 927
  • [6] Cootes TF, 2012, LECT NOTES COMPUT SC, V7578, P278, DOI 10.1007/978-3-642-33786-4_21
  • [7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [8] Random Forest Regressor-Based Approach for Detecting Fault Location and Duration in Power Systems
    El Mrabet, Zakaria
    Sugunaraj, Niroop
    Ranganathan, Prakash
    Abhyankar, Shrirang
    [J]. SENSORS, 2022, 22 (02)
  • [9] Recent advances in predictive (machine) learning
    Friedman, Jerome H.
    [J]. JOURNAL OF CLASSIFICATION, 2006, 23 (02) : 175 - 197
  • [10] Gong FL, 2017, 2017 INTERNATIONAL CONFERENCE ON SOCIAL SCIENCES, ARTS AND HUMANITIES (SSAH 2017), P90