Nonparametric Sequential Clustering of Data Streams with Composite Distributions

被引:1
|
作者
Sreenivasan, Sreeram C. [1 ]
Bhashyam, Srikrishna [1 ]
机构
[1] Indian Inst Technol Madras, Chennai 600036, Tamil Nadu, India
关键词
nonparametric methods; nonparametric clustering; sequential nonparametric testing; sequential hypothesis testing; nonparametric hypothesis testing; sequential decision rules  anomaly detection; ALGORITHM; EFFICIENT;
D O I
10.1016/j.sigpro.2022.108827
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We study a sequential nonparametric clustering problem to group a finite set of S data streams into K clusters. The data streams are real-valued i.i.d data sequences generated from unknown continuous distributions. The distributions themselves are organized into clusters according to their proximity to each other based on a certain distance metric. The sequential tests are universal in the sense that they are independent of the underlying configuration of the distribution clusters, and the distributions themselves, as long as the maximum intra-cluster distance is smaller than the minimum inter-cluster distance. We propose sequential nonparametric clustering tests for two cases: (1) K known and (2) K unknown. In both cases, we show that the proposed sequential nonparametric clustering tests stop in finite time almost surely and are universally exponentially consistent. Further, we also bound the asymptotic growth rate of the expected stopping time as probability of error goes to zero. Our results generalize earlier work on sequential nonparametric anomaly detection to the more general sequential nonparametric clustering problem. This generalization also provides a new test for the special case of anomaly detection where the anomalous data streams can follow distinct probability distributions. We also devise a modification of the proposed sequential nonparametric clustering tests that can result in significant computational savings with negligible performance degradation. Simulations show that all our proposed sequential clustering tests outperform the corresponding fixed sample size tests in terms of the expected number of samples for a given probability of error. The simulation results also demonstrate the advantage of our proposed clustering tests in anomaly detection problems with distinct anomalies.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] hermiter: R package for sequential nonparametric estimation
    Stephanou, Michael
    Varughese, Melvin
    COMPUTATIONAL STATISTICS, 2024, 39 (03) : 1127 - 1163
  • [42] Nonparametric Estimation of Probabilistic Membership for Subspace Clustering
    Lee, Jieun
    Lee, Hyeogjin
    Lee, Minsik
    Kwak, Nojun
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (03) : 1023 - 1036
  • [43] Bayesian Nonparametric Clustering for Positive Definite Matrices
    Cherian, Anoop
    Morellas, Vassilios
    Papanikolopoulos, Nikolaos
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (05) : 862 - 874
  • [44] Robust adaptive online sequential extreme learning machine for predicting nonstationary data streams with outliers
    Guo, Wei
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2019, 13
  • [45] Mapping forest fires by nonparametric clustering analysis
    Tutmez, Bulent
    Ozdogan, Mert G.
    Boran, Ahmet
    JOURNAL OF FORESTRY RESEARCH, 2018, 29 (01) : 177 - 185
  • [46] An effective density-based clustering and dynamic maintenance framework for evolving medical data streams
    Al-Shammari, Ahmed
    Zhou, Rui
    Naseriparsaa, Mehdi
    Liu, Chengfei
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 126 : 176 - 186
  • [47] Clustering Multivariate Normal Distributions
    Nielsen, Frank
    Nock, Richard
    EMERGING TRENDS IN VISUAL COMPUTING, 2009, 5416 : 164 - +
  • [48] Clustering Text Data Streams - A tree Based Approach with Ternary Function and Ternary Feature Vector
    PhridviRaj, M. S. B.
    Srinivas, Chintakindi
    GuruRao, C. V.
    2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 976 - 984
  • [49] Novel Distributed Sequential Nonparametric Tests for Spectrum Sensing
    Ibrahim, Febi
    Sharma, Vinod
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 1180 - 1184
  • [50] Axially Symmetric Data Clustering Through Dirichlet Process Mixture Models of Watson Distributions
    Fan, Wentao
    Bouguila, Nizar
    Du, Ji-Xiang
    Liu, Xin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (06) : 1683 - 1694