Nonparametric Sequential Clustering of Data Streams with Composite Distributions

被引:1
|
作者
Sreenivasan, Sreeram C. [1 ]
Bhashyam, Srikrishna [1 ]
机构
[1] Indian Inst Technol Madras, Chennai 600036, Tamil Nadu, India
关键词
nonparametric methods; nonparametric clustering; sequential nonparametric testing; sequential hypothesis testing; nonparametric hypothesis testing; sequential decision rules  anomaly detection; ALGORITHM; EFFICIENT;
D O I
10.1016/j.sigpro.2022.108827
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We study a sequential nonparametric clustering problem to group a finite set of S data streams into K clusters. The data streams are real-valued i.i.d data sequences generated from unknown continuous distributions. The distributions themselves are organized into clusters according to their proximity to each other based on a certain distance metric. The sequential tests are universal in the sense that they are independent of the underlying configuration of the distribution clusters, and the distributions themselves, as long as the maximum intra-cluster distance is smaller than the minimum inter-cluster distance. We propose sequential nonparametric clustering tests for two cases: (1) K known and (2) K unknown. In both cases, we show that the proposed sequential nonparametric clustering tests stop in finite time almost surely and are universally exponentially consistent. Further, we also bound the asymptotic growth rate of the expected stopping time as probability of error goes to zero. Our results generalize earlier work on sequential nonparametric anomaly detection to the more general sequential nonparametric clustering problem. This generalization also provides a new test for the special case of anomaly detection where the anomalous data streams can follow distinct probability distributions. We also devise a modification of the proposed sequential nonparametric clustering tests that can result in significant computational savings with negligible performance degradation. Simulations show that all our proposed sequential clustering tests outperform the corresponding fixed sample size tests in terms of the expected number of samples for a given probability of error. The simulation results also demonstrate the advantage of our proposed clustering tests in anomaly detection problems with distinct anomalies.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Sequential Nonparametric Detection of Anomalous Data Streams
    Sreenivasan, Sreeram C.
    Bhashyam, Srikrishna
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 932 - 936
  • [2] K-Medoids Clustering of Data Sequences With Composite Distributions
    Wang, Tiexing
    Li, Qunwei
    Bucci, Donald J.
    Liang, Yingbin
    Chen, Biao
    Varshney, Pramod K.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (08) : 2093 - 2106
  • [3] Accelerated Sequential Data Clustering
    Mortazavi, Reza
    Enayati, Elham
    Basiri, Abdolali
    JOURNAL OF CLASSIFICATION, 2024, 41 (02) : 245 - 263
  • [4] Nonparametric clustering of RNA-sequencing data
    Lozano, Gabriel
    Atallah, Nadia
    Levine, Michael
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (06) : 547 - 559
  • [5] Incomplete high dimensional data streams clustering
    Najib, Fatma M.
    Ismail, Rasha M.
    Badr, Nagwa L.
    Gharib, Tarek F.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 4227 - 4243
  • [6] CIS: A nonparametric clustering algorithm for gene expression data
    Zhao, YH
    Yin, Y
    Wang, GR
    Mao, KM
    PROCEEDINGS OF THE 11TH JOINT INTERNATIONAL COMPUTER CONFERENCE, 2005, : 651 - 656
  • [7] JOINT SEQUENTIAL DETECTION AND ISOLATION FOR DEPENDENT DATA STREAMS
    Chaudhuri, Anamitra
    Fellouris, Georgios
    ANNALS OF STATISTICS, 2024, 52 (05): : 1899 - 1926
  • [8] Online Sparse Representation Clustering for Evolving Data Streams
    Chen, Jie
    Yang, Shengxiang
    Fahy, Conor
    Wang, Zhu
    Guo, Yinan
    Chen, Yingke
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 525 - 539
  • [9] Clustering based approach for incomplete data streams processing
    Najib, Fatma M.
    Ismail, Rasha M.
    Badr, Nagwa L.
    Gharib, Tarek F.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (03) : 3213 - 3227
  • [10] Sequential Subspace Clustering via Temporal Smoothness for Sequential Data Segmentation
    Liu, Haijun
    Cheng, Jian
    Wang, Feng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (02) : 866 - 878