The Complete Gradient Clustering Algorithm: properties in practical applications

被引:25
作者
Kulczycki, Piotr [1 ,2 ]
Charytanowicz, Malgorzata [1 ,3 ]
Kowalski, Piotr A. [1 ,2 ]
Lukasik, Szymon [1 ,2 ]
机构
[1] Polish Acad Sci, Syst Res Inst, Ctr Informat Technol Data Anal Methods, PL-01447 Warsaw, Poland
[2] Cracow Univ Technol, Dept Automat Control & Informat Technol, Krakow, Poland
[3] Catholic Univ Lublin, Inst Math & Comp Sci, Lublin, Poland
关键词
data analysis and exploration; clustering; nonparametric methods; kernel estimators; seed production; mobile phone operator; fuzzy controller; MEAN SHIFT; DENSITY-ESTIMATION;
D O I
10.1080/02664763.2011.644526
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The aim of this paper is to present a Complete Gradient Clustering Algorithm, its applicational aspects and properties, as well as to illustrate them with specific practical problems from the subject of bioinformatics (the categorization of grains for seed production), management (the design of a marketing support strategy for a mobile phone operator) and engineering (the synthesis of a fuzzy controller). The main property of the Complete Gradient Clustering Algorithm is that it does not require strict assumptions regarding the desired number of clusters, which allows to better suit its obtained number to a real data structure. In the basic version it is possible to provide a complete set of procedures for defining the values of all functions and parameters relying on the optimization criterions. It is also possible to point out parameters, the potential change which implies influence on the size of the number of clusters (while still not giving an exact number) and the proportion between their numbers in dense and sparse areas of data elements. Moreover, the Complete Gradient Clustering Algorithm can be used to identify and possibly eliminate atypical elements (outliers). These properties proved to be very useful in the presented applications and may also be functional in many other practical problems.
引用
收藏
页码:1211 / 1224
页数:14
相关论文
共 29 条
[21]  
Rodríguez R, 2006, SCI RES ESSAYS, V1, P43
[22]  
Silverman B.W., 1986, DENSITY ESTIMATION S, DOI [10.1201/9781315140919, DOI 10.1201/9781315140919]
[23]   Clustering probability distributions [J].
Tai Vo Van ;
Pham-Gia, T. .
JOURNAL OF APPLIED STATISTICS, 2010, 37 (11) :1891-1910
[24]   Evolutionary design and implementation of a hard disk drive servo control system [J].
Tan, K. C. ;
Sathikannan, R. ;
Tan, W. W. ;
Loh, A. P. .
SOFT COMPUTING, 2007, 11 (02) :131-139
[25]   KNN-kernel density-based clustering for high-dimensional multivariate data [J].
Tran, Thanh N. ;
Wehrens, Ron ;
Buydens, Lutgarde M. C. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 51 (02) :513-525
[26]   Clustering based on kernel density estimation: nearest local maximum searching algorithm [J].
Wang, WJ ;
Tan, YX ;
Jiang, JH ;
Lu, JZ ;
Shen, GL ;
Yu, RQ .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2004, 72 (01) :1-8
[27]  
Yager R.R., 1994, FDN FUZZY MODELING C
[28]  
Yang CJ, 2003, 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 2, PROCEEDINGS, P447
[29]  
Zhang K, 2005, PROC CVPR IEEE, P1001