Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm

被引：0

作者：

Tsagris, Michail ^{[1
]}

Papadakis, Manos ^{[2
]}

Alenazi, Abdulaziz ^{[2
]}

Alzeley, Omar ^{[3
]}

机构：

[1] Univ Crete, Dept Econ, Gallos Campus, Rethimnon 74100, Greece

[2] Northern Border Univ, Coll Sci, Dept Math, Ar Ar 73213, Saudi Arabia

[3] Umm Al Qura Univ, Al Qunfudah Univ Coll, Dept Math, Mecca 24382, Saudi Arabia

来源：

COMPUTATION | 2024年 / 12卷 / 09期

关键词：

high-dimensional data; outliers; computational efficiency; 6208;

D O I：

10.3390/computation12090185

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Outlier detection, or anomaly detection as it is known in the machine learning community, has gained interest in recent years, and it is commonly used when the sample size is smaller than the number of variables. In 2015, an outlier detection procedure was proposed 7 for this high-dimensional setting, replacing the classic minimum covariance determinant estimator with the minimum diagonal product estimator. Computationally speaking, their method has two drawbacks: (a) it is not computationally efficient and does not scale up, and (b) it is not memory efficient and, in some cases, it is not possible to apply due to memory limits. We address the first issue via efficient code written in both R and C++, whereas for the second issue, we utilize the eigen decomposition and its properties. Experiments are conducted using simulated data to showcase the time improvement, while gene expression data are used to further examine some extra practicalities associated with the algorithm. The simulation studies yield a speed-up factor that ranges between 17 and 1800, implying a successful reduction in the estimator's computational burden.

引用

页数：10

共 50 条

[21] Clustering algorithm of high-dimensional data based on units
School of In formation Engineering, Hubei Institute for Nationalities, Enshi 445000, China
Jisuanji Yanjiu yu Fazhan, 2007, 9 (1618-1623): : 1618 - 1623
[22] Outlier detection based on variance of angle in high dimensional data
Liu, Wenting
SIXTH INTERNATIONAL CONFERENCE ON ELECTRONICS AND INFORMATION ENGINEERING, 2015, 9794
[23] Efficient feature selection filters for high-dimensional data
Ferreira, Artur J.
Figueiredo, Mario A. T.
PATTERN RECOGNITION LETTERS, 2012, 33 (13) : 1794 - 1804
[24] Multiple change point detection for high-dimensional data
Zhao, Wenbiao
Zhu, Lixing
Tan, Falong
TEST, 2024, 33 (03) : 809 - 846
[25] Efficient feature selection for high-dimensional data using two-level filter
Li, Y
Wu, ZF
Liu, JM
Tang, YY
PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1711 - 1716
[26] Subspace Outlier Detection in High Dimensional Data using Ensemble of PCA-based Subspaces
Riahi-Madvar, Mahboobeh
Nasersharif, Babak
Azirani, Ahmad Akbari
2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC), 2021,
[27] Efficient kNN Join over Dynamic High-Dimensional Data
Ukey, Nimish
Yang, Zhengyi
Zhang, Guangjian
Liu, Boge
Li, Binghao
Zhang, Wenjie
DATABASES THEORY AND APPLICATIONS (ADC 2022), 2022, 13459 : 63 - 75
[28] Efficient Parallel Skyline Query Processing for High-Dimensional Data
Tang, Mingjie
Yu, Yongyang
Aref, Walid G.
Malluhi, Qutaibah M.
Ouzzani, Mourad
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1838 - 1851
[29] A Modified Apriori Algorithm for Analysing High-Dimensional Gene Data
Pommerenke, Claudia
Friedrich, Benedikt
Johl, Thorsten
Jaensch, Lothar
Haeussler, Susanne
Klawonn, Frank
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2011, 2011, 6936 : 236 - +
[30] Autoencoder-based outlier detection for sparse, high dimensional data
Chen, Wanghu
Li, Huijun
Li, Jing
Arshad, Ali
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2735 - 2742

← 1 2 3 4 5 →