A fast consistent grid-based clustering algorithm

被引:1
作者
Tarasenko, Anton S. [1 ,2 ]
Berikov, Vladimir B. [1 ,2 ]
Pestunov, Igor A. [3 ]
Rylov, Sergey A. [3 ]
Ruzankin, Pavel S. [1 ,2 ]
机构
[1] Sobolev Inst Math, Novosibirsk, Russia
[2] Novosibirsk State Univ, Novosibirsk, Russia
[3] Fed Res Ctr Informat & Computat Technol, Novosibirsk, Russia
关键词
Clustering; Estimator for the number of clusters; Density level sets; Big data; EMPIRICAL PROCESSES; DENSITY;
D O I
10.1007/s10044-024-01354-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a fast consistent grid-based algorithm that estimates the number of clusters for observations in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathbb {R}}}<^>d$$\end{document} and, besides, constructs an approximation for the clusters. Consistency is proved under certain conditions. The time complexity of the algorithm can be made linear retaining the consistency. Numerical experiments confirm high computational efficiency of the new algorithm and its ability to process large datasets.
引用
收藏
页数:11
相关论文
共 33 条
[1]  
Achtert E, 2006, LECT NOTES ARTIF INT, V3918, P119
[2]   PROBABILITY-INEQUALITIES FOR EMPIRICAL PROCESSES AND A LAW OF THE ITERATED LOGARITHM [J].
ALEXANDER, KS .
ANNALS OF PROBABILITY, 1984, 12 (04) :1041-1067
[3]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[4]  
[Anonymous], 2012, P MACHINE LEARNING R, V22, P1090
[5]   Recombinator-k-Means: An Evolutionary Algorithm That Exploits k-Means plus plus for Recombination [J].
Baldassi, Carlo .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2022, 26 (05) :991-1003
[6]  
Chen Z, 2007, ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 5, PROCEEDINGS, P207
[7]   MEAN SHIFT, MODE SEEKING, AND CLUSTERING [J].
CHENG, YZ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) :790-799
[8]   Estimating the number of clusters [J].
Cuevas, A ;
Febrero, M ;
Fraiman, R .
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2000, 28 (02) :367-382
[9]   Cluster analysis: a further approach based on density estimation [J].
Cuevas, A ;
Febrero, M ;
Fraiman, R .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2001, 36 (04) :441-459
[10]  
Ester M., 1996, Knowledge Discovery and Data Mining, V96, P226, DOI DOI 10.5555/3001460.3001507