On Application of a Probabilistic K-Nearest Neighbors Model for Cluster Validation Problem

被引:2
|
作者
Volkovich, Zeev [1 ]
Barzily, Zeev [1 ]
Avros, Renata [1 ]
Toledano-Kitai, Dvora [1 ]
机构
[1] ORT Braude Coll Engn, Software Engn Dept, IL-21982 Karmiel, Israel
关键词
Clustering; Cluster stability; Data mining; K-Nearest neighbors; RESAMPLING METHOD; NUMBER;
D O I
10.1080/03610926.2011.562786
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
K-Nearest Neighbors is a widely used technique for classifying and clustering data. In the current article, we address the cluster stability problem based upon probabilistic characteristics of this approach. We estimate the stability of partitions obtained from clustering pairs of samples. Partitions are presumed to be consistent if their clusters are stable. Clusters validity is quantified through the amount of K-Nearest Neighbors belonging to the point's sample. The null-hypothesis, of the well-mixed samples within the clusters, suggests Binomial Distribution of this quantity with K trials and the success probability 0.5. A cluster is represented by a summarizing index, of the p-values calculated over all cluster objects, under the null hypothesis for the alternative, and the partition quality is evaluated via the worst partition cluster. The true number of clusters is attained by the empirical index distribution having maximal suitable asymmetry. The proposed methodology offers to produce the index distributions sequentially and to assess their asymmetry. Numerical experiments exhibit a good capability of the methodology to expose the true number of clusters.
引用
收藏
页码:2997 / 3010
页数:14
相关论文
共 50 条
  • [1] Hybrid model for predicting an unknown process based on a cluster version of the K-nearest neighbors method
    Ruslan, Gatin
    Svetlana, Novikova
    Natalia, Valitova
    Elmira, Kremleva
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 419 - 422
  • [2] Brief Announcement: Efficient Distributed Algorithms for the K-Nearest Neighbors Problem
    Fathi, Reza
    Molla, Anisur Rahaman
    Pandurangan, Gopal
    PROCEEDINGS OF THE 32ND ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (SPAA '20), 2020, : 527 - 529
  • [3] Local interpretation of nonlinear regression model with k-nearest neighbors
    Kaneko, Hiromasa
    DIGITAL CHEMICAL ENGINEERING, 2023, 6
  • [4] Oversampling by genetic algorithm and k-nearest neighbors for network intrusion problem
    Jindaluang, Wattana
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (03) : 2515 - 2528
  • [5] AutoML for Stream k-Nearest Neighbors Classification
    Bahri, Maroua
    Veloso, Bruno
    Bifet, Albert
    Gama, Joao
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 597 - 602
  • [6] Hypersphere anchor loss for K-Nearest neighbors
    Xiang Ye
    Zihang He
    Heng Wang
    Yong Li
    Applied Intelligence, 2023, 53 : 30319 - 30328
  • [7] Hypersphere anchor loss for K-Nearest neighbors
    Ye, Xiang
    He, Zihang
    Wang, Heng
    Li, Yong
    APPLIED INTELLIGENCE, 2023, 53 (24) : 30319 - 30328
  • [8] Toward Predicting Medical Conditions Using k-Nearest Neighbors
    Tayeb, Shahab
    Pirouz, Matin
    Sun, Johann
    Hall, Kaylee
    Chang, Andrew
    Li, Jessica
    Song, Connor
    Chauhan, Apoorva
    Ferra, Michael
    Sager, Theresa
    Zhan, Justin
    Latifi, Shahram
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3897 - 3903
  • [9] Identification of model order and number of neighbors for k-nearest neighbor resampling
    Lee, Taesam
    Ouarda, Taha B. M. J.
    JOURNAL OF HYDROLOGY, 2011, 404 (3-4) : 136 - 145
  • [10] Local generalized quadratic distance metrics: application to the k-nearest neighbors classifier
    Abou-Moustafa, Karim
    Ferrie, Frank P.
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (02) : 341 - 363