Nonparametric Sequential Clustering of Data Streams with Composite Distributions

被引:1
|
作者
Sreenivasan, Sreeram C. [1 ]
Bhashyam, Srikrishna [1 ]
机构
[1] Indian Inst Technol Madras, Chennai 600036, Tamil Nadu, India
关键词
nonparametric methods; nonparametric clustering; sequential nonparametric testing; sequential hypothesis testing; nonparametric hypothesis testing; sequential decision rules  anomaly detection; ALGORITHM; EFFICIENT;
D O I
10.1016/j.sigpro.2022.108827
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We study a sequential nonparametric clustering problem to group a finite set of S data streams into K clusters. The data streams are real-valued i.i.d data sequences generated from unknown continuous distributions. The distributions themselves are organized into clusters according to their proximity to each other based on a certain distance metric. The sequential tests are universal in the sense that they are independent of the underlying configuration of the distribution clusters, and the distributions themselves, as long as the maximum intra-cluster distance is smaller than the minimum inter-cluster distance. We propose sequential nonparametric clustering tests for two cases: (1) K known and (2) K unknown. In both cases, we show that the proposed sequential nonparametric clustering tests stop in finite time almost surely and are universally exponentially consistent. Further, we also bound the asymptotic growth rate of the expected stopping time as probability of error goes to zero. Our results generalize earlier work on sequential nonparametric anomaly detection to the more general sequential nonparametric clustering problem. This generalization also provides a new test for the special case of anomaly detection where the anomalous data streams can follow distinct probability distributions. We also devise a modification of the proposed sequential nonparametric clustering tests that can result in significant computational savings with negligible performance degradation. Simulations show that all our proposed sequential clustering tests outperform the corresponding fixed sample size tests in terms of the expected number of samples for a given probability of error. The simulation results also demonstrate the advantage of our proposed clustering tests in anomaly detection problems with distinct anomalies.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Renewable composite quantile method and algorithm for nonparametric models with streaming data
    Chen, Yan
    Fang, Shuixin
    Lin, Lu
    STATISTICS AND COMPUTING, 2024, 34 (01)
  • [32] Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams
    Sui, Jinping
    Liu, Zhen
    Liu, Li
    Jung, Alexander
    Li, Xiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 4173 - 4186
  • [33] Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review
    Ghani, Nur Laila Ab
    Aziz, Izzatdin Abdul
    AbdulKadir, Said Jadid
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 4649 - 4668
  • [34] NONPARAMETRIC CLUSTERING FOR LONGITUDINAL FUNCTIONAL DATA WITH THE APPLICATION TO H-NMR SPECTRA OF KIDNEY TRANSPLANT PATIENTS. LONGITUDINAL FUNCTIONAL DATA CLUSTERING
    Xie, Minzhen
    Liu, Haiyan
    Houwing-Duistermaat, Jeanine
    THEORETICAL BIOLOGY FORUM, 2021, 114 (01) : 15 - 28
  • [35] A comparative evaluation of unsupervised deep architectures for intrusion detection in sequential data streams
    Sovilj, Dusan
    Budnarain, Paul
    Sanner, Scott
    Salmon, Geoff
    Rao, Mohan
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 159
  • [36] A New Algorithm for Nonparametric Sequential Detection
    Ganguly, Shouvik
    Sahasranand, K. R.
    Sharma, Vinod
    2014 TWENTIETH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2014,
  • [37] Whole Time Series Data Streams Clustering: Dynamic Profiling of the Electricity Consumption
    Gajowniczek, Krzysztof
    Bator, Marcin
    Zabkowski, Tomasz
    ENTROPY, 2020, 22 (12) : 1 - 35
  • [38] Detecting concept change in dynamic data streams: A sequential approach based on reservoir sampling
    Pears R.
    Sakthithasan S.
    Koh Y.S.
    Machine Learning, 2014, 97 (3) : 259 - 293
  • [39] Online Clustering of Evolving Data Streams Using a Density Grid-Based Method
    Tareq, Mustafa
    Sundararajan, Elankovan A.
    Mohd, Masnizah
    Sani, Nor Samsiah
    IEEE ACCESS, 2020, 8 : 166472 - 166490
  • [40] EADetection: An efficient and accurate sequential behavior anomaly detection approach over data streams
    Cheng, Li
    Wang, Yijie
    Zhou, Yong
    Ma, Xingkong
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2018, 14 (10)