Robust archetypoids for anomaly detection in big functional data

被引:16
作者
Vinue, Guillermo [1 ]
Epifanio, Irene [2 ]
机构
[1] Katholieke Univ Leuven, Leuven, Belgium
[2] Univ Jaume 1, Castellon De La Plana, Spain
关键词
Anomaly detection; Functional data analysis; Archetypal analysis; Big data; R package; OUTLIER DETECTION; R PACKAGE; MULTIVARIATE; LOCATION; SERIES;
D O I
10.1007/s11634-020-00412-9
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Archetypoid analysis (ADA) has proven to be a successful unsupervised statistical technique to identify extreme observations in the periphery of the data cloud, both in classical multivariate data and functional data. However, two questions remain open in this field: the use of ADA for outlier detection and its scalability. We propose to use robust functional archetypoids and adjusted boxplot to pinpoint functional outliers. Furthermore, we present a new archetypoid algorithm for obtaining results from large data sets in reasonable time. Functional time series are occurring in many practical problems, so this paper focuses on functional data settings. The new algorithm for detecting functional anomalies, called CRO-FADALARA, can be used with both univariate and multivariate curves. Our proposal for outlier detection is compared with all the state-of-the-art methods in a controlled study, showing a good performance. Furthermore, CRO-FADALARA is applied to two large time series data sets, where outliers curves are discussed and the reduction in computational time is clearly stated. A third case study with a small ECG data set is discussed, given its importance in functional data scenarios. All data, R code and a new R package are freely available.
引用
收藏
页码:437 / 462
页数:26
相关论文
共 50 条
  • [21] Robust local outlier detection with statistical parameter for big data
    Lei, Jingsheng
    Jiang, Teng
    Wu, Kui
    Du, Haizhou
    Zhu, Lin
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2015, 30 (05): : 411 - 419
  • [22] Big Data Driven Anomaly Detection for Cellular Networks
    Zhu, Qiqi
    Sun, Li
    IEEE ACCESS, 2020, 8 : 31398 - 31408
  • [23] Intelligent Big Data Summarization for Rare Anomaly Detection
    Ahmed, Mohiuddin
    IEEE ACCESS, 2019, 7 : 68669 - 68677
  • [24] Statistical wavelet-based anomaly detection in big data with compressive sensing
    Wang, Wei
    Lu, Dunqiang
    Zhou, Xin
    Zhang, Baoju
    Mu, Jiasong
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2013,
  • [25] Multi-level anomaly detection: Relevance of big data analytics in networks
    Sait, Saad Y.
    Bhandari, Akshay
    Khare, Shreya
    James, Cyriac
    Murthy, Hema A.
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2015, 40 (06): : 1737 - 1767
  • [26] The Construction of Rural Poverty Alleviation Audit using Big Data Anomaly Detection
    Zhang, Lifeng
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 879 - 882
  • [27] Statistical wavelet-based anomaly detection in big data with compressive sensing
    Wei Wang
    Dunqiang Lu
    Xin Zhou
    Baoju Zhang
    Jiasong Mu
    EURASIP Journal on Wireless Communications and Networking, 2013
  • [28] Multi-level anomaly detection: Relevance of big data analytics in networks
    Sait S.
    Bhandari A.
    Khare S.
    James C.
    Murthy H.
    Sadhana, 2015, 40 (6) : 1737 - 1767
  • [29] A Theoretical Study of Anomaly Detection in Big Data Distributed Static and Stream Analytics
    Amen, Bakhtiar
    Grigoris, Antonio
    IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 1177 - 1182
  • [30] Big Data Analytics for Network Anomaly Detection from Netflow Data
    Terzi, Duygu Sinanc
    Terzi, Ramazan
    Sagiroglu, Seref
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 592 - 597