Robust archetypoids for anomaly detection in big functional data

被引:16
|
作者
Vinue, Guillermo [1 ]
Epifanio, Irene [2 ]
机构
[1] Katholieke Univ Leuven, Leuven, Belgium
[2] Univ Jaume 1, Castellon De La Plana, Spain
关键词
Anomaly detection; Functional data analysis; Archetypal analysis; Big data; R package; OUTLIER DETECTION; R PACKAGE; MULTIVARIATE; LOCATION; SERIES;
D O I
10.1007/s11634-020-00412-9
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Archetypoid analysis (ADA) has proven to be a successful unsupervised statistical technique to identify extreme observations in the periphery of the data cloud, both in classical multivariate data and functional data. However, two questions remain open in this field: the use of ADA for outlier detection and its scalability. We propose to use robust functional archetypoids and adjusted boxplot to pinpoint functional outliers. Furthermore, we present a new archetypoid algorithm for obtaining results from large data sets in reasonable time. Functional time series are occurring in many practical problems, so this paper focuses on functional data settings. The new algorithm for detecting functional anomalies, called CRO-FADALARA, can be used with both univariate and multivariate curves. Our proposal for outlier detection is compared with all the state-of-the-art methods in a controlled study, showing a good performance. Furthermore, CRO-FADALARA is applied to two large time series data sets, where outliers curves are discussed and the reduction in computational time is clearly stated. A third case study with a small ECG data set is discussed, given its importance in functional data scenarios. All data, R code and a new R package are freely available.
引用
收藏
页码:437 / 462
页数:26
相关论文
共 50 条
  • [1] Robust archetypoids for anomaly detection in big functional data
    Guillermo Vinue
    Irene Epifanio
    Advances in Data Analysis and Classification, 2021, 15 : 437 - 462
  • [2] A Rapid Anomaly Detection Technique for Big Data Curation
    Poonsirivong, Korn
    Jittawiriyanukoon, Chanintorn
    PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
  • [3] Robust Anomaly Detection Algorithms for Real-time Big Data Comparison of algorithms
    Hasani, Zirije
    2017 6TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2017, : 449 - 454
  • [4] Anomaly Detection Guidelines for Data Streams in Big Data
    Rana, Annie Ibrahim
    Estrada, Giovani
    Sole, Marc
    Muntes, Victor
    2016 3RD INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2016), 2016, : 94 - 98
  • [5] Anomaly Detection in Big Data with Separable Compressive Sensing
    Wang, Wei
    Wang, Dan
    Jiang, Shu
    Qin, Shan
    Xue, Lei
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2016, 386 : 589 - 594
  • [6] Online Anomaly Detection in Big Data
    Balasingam, B.
    Sankavaram, M. S.
    Choi, K.
    Ayala, D. F. M.
    Sidoti, D.
    Pattipati, K.
    Willett, P.
    Lintz, C.
    Commeau, G.
    Dorigo, F.
    Fahrny, J.
    2014 17TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2014,
  • [7] Anomaly detection in big data from UWB radars
    Wang, Wei
    Zhou, Xin
    Zhang, Baoju
    Mu, Jiasong
    SECURITY AND COMMUNICATION NETWORKS, 2015, 8 (14) : 2469 - 2475
  • [8] Robust Anomaly Detection on Unreliable Data
    Zhao, Zilong
    Cerf, Sophie
    Birke, Robert
    Robu, Bogdan
    Bouchenak, Sara
    Ben Mokhtar, Sonia
    Chen, Lydia Y.
    2019 49TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2019), 2019, : 630 - 637
  • [9] Anomaly Detection for Big Log Data Using a Hadoop Ecosystem
    Son, Siwoon
    Gil, Myeong-Seon
    Moon, Yang-Sae
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 377 - 380
  • [10] Developing Big Data anomaly dynamic and static detection algorithms: AnomalyDSD spark package
    Garcia-Gil, Diego
    Lopez, David
    Arguelles-Martino, Daniel
    Carrasco, Jacinto
    Aguilera-Martos, Ignacio
    Luengo, Julian
    Herrera, Francisco
    INFORMATION SCIENCES, 2025, 690