Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm

被引:0
作者
Tsagris, Michail [1 ]
Papadakis, Manos [2 ]
Alenazi, Abdulaziz [2 ]
Alzeley, Omar [3 ]
机构
[1] Univ Crete, Dept Econ, Gallos Campus, Rethimnon 74100, Greece
[2] Northern Border Univ, Coll Sci, Dept Math, Ar Ar 73213, Saudi Arabia
[3] Umm Al Qura Univ, Al Qunfudah Univ Coll, Dept Math, Mecca 24382, Saudi Arabia
关键词
high-dimensional data; outliers; computational efficiency; 6208;
D O I
10.3390/computation12090185
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Outlier detection, or anomaly detection as it is known in the machine learning community, has gained interest in recent years, and it is commonly used when the sample size is smaller than the number of variables. In 2015, an outlier detection procedure was proposed 7 for this high-dimensional setting, replacing the classic minimum covariance determinant estimator with the minimum diagonal product estimator. Computationally speaking, their method has two drawbacks: (a) it is not computationally efficient and does not scale up, and (b) it is not memory efficient and, in some cases, it is not possible to apply due to memory limits. We address the first issue via efficient code written in both R and C++, whereas for the second issue, we utilize the eigen decomposition and its properties. Experiments are conducted using simulated data to showcase the time improvement, while gene expression data are used to further examine some extra practicalities associated with the algorithm. The simulation studies yield a speed-up factor that ranges between 17 and 1800, implying a successful reduction in the estimator's computational burden.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Clustering algorithm of high-dimensional data based on units
    School of In formation Engineering, Hubei Institute for Nationalities, Enshi 445000, China
    Jisuanji Yanjiu yu Fazhan, 2007, 9 (1618-1623): : 1618 - 1623
  • [22] Outlier detection based on variance of angle in high dimensional data
    Liu, Wenting
    SIXTH INTERNATIONAL CONFERENCE ON ELECTRONICS AND INFORMATION ENGINEERING, 2015, 9794
  • [23] Efficient feature selection filters for high-dimensional data
    Ferreira, Artur J.
    Figueiredo, Mario A. T.
    PATTERN RECOGNITION LETTERS, 2012, 33 (13) : 1794 - 1804
  • [24] Multiple change point detection for high-dimensional data
    Zhao, Wenbiao
    Zhu, Lixing
    Tan, Falong
    TEST, 2024, 33 (03) : 809 - 846
  • [25] Efficient feature selection for high-dimensional data using two-level filter
    Li, Y
    Wu, ZF
    Liu, JM
    Tang, YY
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1711 - 1716
  • [26] Subspace Outlier Detection in High Dimensional Data using Ensemble of PCA-based Subspaces
    Riahi-Madvar, Mahboobeh
    Nasersharif, Babak
    Azirani, Ahmad Akbari
    2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC), 2021,
  • [27] Efficient kNN Join over Dynamic High-Dimensional Data
    Ukey, Nimish
    Yang, Zhengyi
    Zhang, Guangjian
    Liu, Boge
    Li, Binghao
    Zhang, Wenjie
    DATABASES THEORY AND APPLICATIONS (ADC 2022), 2022, 13459 : 63 - 75
  • [28] Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Tang, Mingjie
    Yu, Yongyang
    Aref, Walid G.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1838 - 1851
  • [29] A Modified Apriori Algorithm for Analysing High-Dimensional Gene Data
    Pommerenke, Claudia
    Friedrich, Benedikt
    Johl, Thorsten
    Jaensch, Lothar
    Haeussler, Susanne
    Klawonn, Frank
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2011, 2011, 6936 : 236 - +
  • [30] Autoencoder-based outlier detection for sparse, high dimensional data
    Chen, Wanghu
    Li, Huijun
    Li, Jing
    Arshad, Ali
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2735 - 2742