On Application of a Probabilistic K-Nearest Neighbors Model for Cluster Validation Problem

被引:2
|
作者
Volkovich, Zeev [1 ]
Barzily, Zeev [1 ]
Avros, Renata [1 ]
Toledano-Kitai, Dvora [1 ]
机构
[1] ORT Braude Coll Engn, Software Engn Dept, IL-21982 Karmiel, Israel
关键词
Clustering; Cluster stability; Data mining; K-Nearest neighbors; RESAMPLING METHOD; NUMBER;
D O I
10.1080/03610926.2011.562786
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
K-Nearest Neighbors is a widely used technique for classifying and clustering data. In the current article, we address the cluster stability problem based upon probabilistic characteristics of this approach. We estimate the stability of partitions obtained from clustering pairs of samples. Partitions are presumed to be consistent if their clusters are stable. Clusters validity is quantified through the amount of K-Nearest Neighbors belonging to the point's sample. The null-hypothesis, of the well-mixed samples within the clusters, suggests Binomial Distribution of this quantity with K trials and the success probability 0.5. A cluster is represented by a summarizing index, of the p-values calculated over all cluster objects, under the null hypothesis for the alternative, and the partition quality is evaluated via the worst partition cluster. The true number of clusters is attained by the empirical index distribution having maximal suitable asymmetry. The proposed methodology offers to produce the index distributions sequentially and to assess their asymmetry. Numerical experiments exhibit a good capability of the methodology to expose the true number of clusters.
引用
收藏
页码:2997 / 3010
页数:14
相关论文
共 50 条
  • [31] Density peaks clustering based on k-nearest neighbors sharing
    Fan, Tanghuai
    Yao, Zhanfeng
    Han, Longzhe
    Liu, Baohong
    Lv, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (05)
  • [32] Wind power forecasting using the k-nearest neighbors algorithm
    Mangalova, E.
    Agafonov, E.
    INTERNATIONAL JOURNAL OF FORECASTING, 2014, 30 (02) : 402 - 406
  • [33] Information theoretic clustering using a k-nearest neighbors approach
    Vikjord, Vidar V.
    Jenssen, Robert
    PATTERN RECOGNITION, 2014, 47 (09) : 3070 - 3081
  • [34] k-Nearest Neighbors Optimization-Based Outlier Removal
    Yosipof, Abraham
    Senderowitz, Hanoch
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2015, 36 (08) : 493 - 506
  • [35] Predict the Reliability Life of Wafer Level Packaging using K-Nearest Neighbors algorithm with Cluster Analysis
    Chen, H. L.
    Chen, B. S.
    Chiang, K. N.
    2022 17TH INTERNATIONAL MICROSYSTEMS, PACKAGING, ASSEMBLY AND CIRCUITS TECHNOLOGY CONFERENCE (IMPACT), 2022,
  • [36] A More Realistic k-Nearest Neighbors Method and its Possible Applications to Everyday Problems
    Cadenas, Jose M.
    Carmen Garrido, M.
    Martinez-Espana, Raquel
    Munoz, Andres
    2017 13TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS (IE 2017), 2017, : 52 - 59
  • [37] A New Algorithm for Large-Scale Geographically Weighted Regression with K-Nearest Neighbors
    Yang, Xiaoyue
    Yang, Yi
    Xu, Shenghua
    Han, Jiakuan
    Chai, Zhengyuan
    Yang, Gang
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2023, 12 (07)
  • [38] A k-nearest neighbors based approach applied to more realistic activity recognition datasets
    Cadenas, Jose M.
    Carmen Garrido, M.
    Martinez-Espana, Raquel
    Munoz, Andres
    JOURNAL OF AMBIENT INTELLIGENCE AND SMART ENVIRONMENTS, 2018, 10 (03) : 247 - 259
  • [39] A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning
    Saadatfar, Hamid
    Khosravi, Samiyeh
    Joloudari, Javad Hassannataj
    Mosavi, Amir
    Shamshirband, Shahaboddin
    MATHEMATICS, 2020, 8 (02)
  • [40] A novel ranked k-nearest neighbors algorithm for missing data imputation
    Khan, Yasir
    Shah, Said Farooq
    Asim, Syed Muhammad
    JOURNAL OF APPLIED STATISTICS, 2025, 52 (05) : 1103 - 1127