The Complete Gradient Clustering Algorithm: properties in practical applications

被引:25
作者
Kulczycki, Piotr [1 ,2 ]
Charytanowicz, Malgorzata [1 ,3 ]
Kowalski, Piotr A. [1 ,2 ]
Lukasik, Szymon [1 ,2 ]
机构
[1] Polish Acad Sci, Syst Res Inst, Ctr Informat Technol Data Anal Methods, PL-01447 Warsaw, Poland
[2] Cracow Univ Technol, Dept Automat Control & Informat Technol, Krakow, Poland
[3] Catholic Univ Lublin, Inst Math & Comp Sci, Lublin, Poland
关键词
data analysis and exploration; clustering; nonparametric methods; kernel estimators; seed production; mobile phone operator; fuzzy controller; MEAN SHIFT; DENSITY-ESTIMATION;
D O I
10.1080/02664763.2011.644526
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The aim of this paper is to present a Complete Gradient Clustering Algorithm, its applicational aspects and properties, as well as to illustrate them with specific practical problems from the subject of bioinformatics (the categorization of grains for seed production), management (the design of a marketing support strategy for a mobile phone operator) and engineering (the synthesis of a fuzzy controller). The main property of the Complete Gradient Clustering Algorithm is that it does not require strict assumptions regarding the desired number of clusters, which allows to better suit its obtained number to a real data structure. In the basic version it is possible to provide a complete set of procedures for defining the values of all functions and parameters relying on the optimization criterions. It is also possible to point out parameters, the potential change which implies influence on the size of the number of clusters (while still not giving an exact number) and the proportion between their numbers in dense and sparse areas of data elements. Moreover, the Complete Gradient Clustering Algorithm can be used to identify and possibly eliminate atypical elements (outliers). These properties proved to be very useful in the presented applications and may also be functional in many other practical problems.
引用
收藏
页码:1211 / 1224
页数:14
相关论文
共 29 条
[1]  
[Anonymous], 1994, Kernel smoothing
[2]  
[Anonymous], 1994, Fuzzy preference modelling and multicriteria decision support
[3]  
[Anonymous], 1988, Algorithms for Clustering Data
[4]  
Carreira-Perpin M., 2006, P 23 INT C MACH LEAR, P153, DOI DOI 10.1145/1143844.1143864
[5]  
Cheney W., 2002, Numerical Analysis
[6]   MEAN SHIFT, MODE SEEKING, AND CLUSTERING [J].
CHENG, YZ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) :790-799
[7]   Mean shift: A robust approach toward feature space analysis [J].
Comaniciu, D ;
Meer, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) :603-619
[8]  
Everitt B. S., 2001, CLUSTER ANAL
[9]  
FUKUNAGA K, 1975, IEEE T INFORM THEORY, V21, P32, DOI 10.1109/TIT.1975.1055330
[10]   Probability density estimation from optimally condensed data samples [J].
Girolami, M ;
He, C .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (10) :1253-1264