Nonparametric Sequential Clustering of Data Streams with Composite Distributions

被引:1
|
作者
Sreenivasan, Sreeram C. [1 ]
Bhashyam, Srikrishna [1 ]
机构
[1] Indian Inst Technol Madras, Chennai 600036, Tamil Nadu, India
关键词
nonparametric methods; nonparametric clustering; sequential nonparametric testing; sequential hypothesis testing; nonparametric hypothesis testing; sequential decision rules  anomaly detection; ALGORITHM; EFFICIENT;
D O I
10.1016/j.sigpro.2022.108827
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We study a sequential nonparametric clustering problem to group a finite set of S data streams into K clusters. The data streams are real-valued i.i.d data sequences generated from unknown continuous distributions. The distributions themselves are organized into clusters according to their proximity to each other based on a certain distance metric. The sequential tests are universal in the sense that they are independent of the underlying configuration of the distribution clusters, and the distributions themselves, as long as the maximum intra-cluster distance is smaller than the minimum inter-cluster distance. We propose sequential nonparametric clustering tests for two cases: (1) K known and (2) K unknown. In both cases, we show that the proposed sequential nonparametric clustering tests stop in finite time almost surely and are universally exponentially consistent. Further, we also bound the asymptotic growth rate of the expected stopping time as probability of error goes to zero. Our results generalize earlier work on sequential nonparametric anomaly detection to the more general sequential nonparametric clustering problem. This generalization also provides a new test for the special case of anomaly detection where the anomalous data streams can follow distinct probability distributions. We also devise a modification of the proposed sequential nonparametric clustering tests that can result in significant computational savings with negligible performance degradation. Simulations show that all our proposed sequential clustering tests outperform the corresponding fixed sample size tests in terms of the expected number of samples for a given probability of error. The simulation results also demonstrate the advantage of our proposed clustering tests in anomaly detection problems with distinct anomalies.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Online nonparametric monitoring of heterogeneous data streams with partial observations based on Thompson sampling
    Ye, Honghan
    Xian, Xiaochen
    Cheng, Jing-Ru C.
    Hable, Brock
    Shannon, Robert W.
    Elyaderani, Mojtaba Kadkhodaie
    Liu, Kaibo
    IISE TRANSACTIONS, 2023, 55 (04) : 392 - 404
  • [22] A Systematic Review of Density Grid-Based Clustering for Data Streams
    Tareq, Mustafa
    Sundararajan, Elankovan A.
    Harwood, Aaron
    Abu Bakar, Azuraliza
    IEEE ACCESS, 2022, 10 : 579 - 596
  • [23] Statistical hierarchical clustering algorithm for outlier detection in evolving data streams
    Dalibor Krleža
    Boris Vrdoljak
    Mario Brčić
    Machine Learning, 2021, 110 : 139 - 184
  • [24] Two-Stage Sparse Representation Clustering for Dynamic Data Streams
    Chen, Jie
    Wang, Zhu
    Yang, Shengxiang
    Mao, Hua
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (10) : 6408 - 6420
  • [25] Efficient Subspace Clustering of Large-scale Data Streams with Misses
    Traganitis, Panagiotis A.
    Giannakis, Georgios B.
    2016 ANNUAL CONFERENCE ON INFORMATION SCIENCE AND SYSTEMS (CISS), 2016,
  • [26] Statistical hierarchical clustering algorithm for outlier detection in evolving data streams
    Krleza, Dalibor
    Vrdoljak, Boris
    Brcic, Mario
    MACHINE LEARNING, 2021, 110 (01) : 139 - 184
  • [27] UNIC: A fast nonparametric clustering
    Leopold, Nadiia
    Rose, Oliver
    PATTERN RECOGNITION, 2020, 100 (100)
  • [28] Nonparametric clustering for image segmentation
    Menardi, Giovanna
    STATISTICAL ANALYSIS AND DATA MINING, 2020, 13 (01) : 83 - 97
  • [29] A Differentially Private Big Data Nonparametric Bayesian Clustering Algorithm in Smart Grid
    Guan, Zhitao
    Lv, Zefang
    Sun, Xianwen
    Wu, Longfei
    Wu, Jun
    Du, Xiaojiang
    Guizani, Mohsen
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (04): : 2631 - 2641
  • [30] Nonparametric Clustering of Mixed Data Using Modified Chi-Squared Tests
    Xu, Yawen
    Gao, Xin
    Wang, Xiaogang
    ENTROPY, 2022, 24 (12)