Robust archetypoids for anomaly detection in big functional data

被引:16
作者
Vinue, Guillermo [1 ]
Epifanio, Irene [2 ]
机构
[1] Katholieke Univ Leuven, Leuven, Belgium
[2] Univ Jaume 1, Castellon De La Plana, Spain
关键词
Anomaly detection; Functional data analysis; Archetypal analysis; Big data; R package; OUTLIER DETECTION; R PACKAGE; MULTIVARIATE; LOCATION; SERIES;
D O I
10.1007/s11634-020-00412-9
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Archetypoid analysis (ADA) has proven to be a successful unsupervised statistical technique to identify extreme observations in the periphery of the data cloud, both in classical multivariate data and functional data. However, two questions remain open in this field: the use of ADA for outlier detection and its scalability. We propose to use robust functional archetypoids and adjusted boxplot to pinpoint functional outliers. Furthermore, we present a new archetypoid algorithm for obtaining results from large data sets in reasonable time. Functional time series are occurring in many practical problems, so this paper focuses on functional data settings. The new algorithm for detecting functional anomalies, called CRO-FADALARA, can be used with both univariate and multivariate curves. Our proposal for outlier detection is compared with all the state-of-the-art methods in a controlled study, showing a good performance. Furthermore, CRO-FADALARA is applied to two large time series data sets, where outliers curves are discussed and the reduction in computational time is clearly stated. A third case study with a small ECG data set is discussed, given its importance in functional data scenarios. All data, R code and a new R package are freely available.
引用
收藏
页码:437 / 462
页数:26
相关论文
共 50 条
[41]   Big Data-driven Automated Anomaly Detection and Performance Forecasting in Mobile Networks [J].
Moysen, Jessica ;
Ahmed, Furqan ;
Garcia-Lozano, Mario ;
Niemela, Jarno .
2020 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2020,
[42]   Big-data-driven Anomaly Detection in Industry (4.0): an approach and a case study [J].
Stojanovic, Ljiljana ;
Dinic, Marko ;
Stojanovic, Nenad ;
Stojadinovic, Aleksandar .
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, :1647-1652
[43]   Big Data Stream Anomaly Detection with Spectral Method for UWB Radar Data [J].
Yun, Ying ;
Wang, Wei .
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2015, 322 :253-259
[44]   Network Security and Anomaly Detection with Big-DAMA, a Big Data Analytics Framework [J].
Casas, Pedro ;
Soro, Francesca ;
Vanerio, Juan ;
Settanni, Giuseppe ;
D'Alconzo, Alessandro .
PROCEEDINGS OF THE 2017 IEEE 6TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (CLOUDNET), 2017, :16-22
[45]   Robust Anomaly Detection for Large-Scale Sensor Data [J].
Chakrabarti, Aniket ;
Marwah, Manish ;
Arlitt, Martin .
BUILDSYS'16: PROCEEDINGS OF THE 3RD ACM CONFERENCE ON SYSTEMS FOR ENERGY-EFFCIENT BUILT ENVIRONMENTS, 2016, :31-40
[46]   Multivariate and functional robust fusion methods for structured Big Data [J].
Aaron, Catherine ;
Cholaquidis, Alejandro ;
Fraiman, Ricardo ;
Ghattas, Badih .
JOURNAL OF MULTIVARIATE ANALYSIS, 2019, 170 :149-161
[47]   Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis [J].
Widad, Elouataoui ;
Saida, Elmendili ;
Gahi, Youssef .
IEEE ACCESS, 2023, 11 :103306-103318
[48]   TrueDetective 4.0: A Big Data Architecture for Real Time Anomaly Detection [J].
Argento, Luciano ;
De Francesco, Erika ;
Lambardi, Pasquale ;
Piantedosi, Paolo ;
Romeo, Carlo .
FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2022), 2022, 13515 :449-458
[49]   Big Log Data Stream Processing: Adapting an Anomaly Detection Technique [J].
Dietz, Marietheres ;
Pernul, Guenther .
DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 :159-166
[50]   An Algorithm Design of Big Data Anomaly Detection Based on Ensemble Learning [J].
Chen, Xiao .
PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024, 2024, :319-323