Robust mixture model-based clustering with genetic algorithm approach

被引:2
作者
Nguyen Duc Thang [1 ]
Chen, Lihui [1 ]
Chan, Chee Keong [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Div Informat Engn, Singapore, Singapore
关键词
Clustering; robust clustering; outliers; maximum likelihood; EM algorithm; mixture model; genetic algorithm; MULTIVARIATE LOCATION; LIKELIHOOD ESTIMATORS; K-MEANS; SELECTION;
D O I
10.3233/IDA-2010-0472
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address the robustness issue of maximum likelihood based methods in data clustering. Probabilistic mixture model has been a well known approach to cluster analysis. However, as they rely on maximum likelihood estimation (MLE), the algorithms are often very sensitive to noise and outliers. In this work, we implement a variant of the classical mixture model-based clustering (M2C) following a proposed general framework for handling outliers. Genetic Algorithm (GA) is incorporated into the framework to produce a novel algorithm called GA-based Partial M2C (GA-PM2C). Analytical and experimental studies show that GA-PM2C can overcome the negative impact of outliers in data clustering, hence provides highly accurate and reliable clustering results. It also exhibits excellent consistency in performance and low sensitivity to initializations.
引用
收藏
页码:357 / 373
页数:17
相关论文
共 34 条
[1]  
[Anonymous], STAT TEXTBOOKS MONOG
[2]  
[Anonymous], 2004, Wiley InterScience electronic collection.
[3]  
Asuncion A., UCI MACHINE LEARNING
[4]   The forward search and data visualisation [J].
Atkinson, AC ;
Riani, M .
COMPUTATIONAL STATISTICS, 2004, 19 (01) :29-54
[5]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[6]   High-dimensional data clustering [J].
Bouveyron, C. ;
Girard, S. ;
Schmid, C. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :502-519
[7]   Mixture models in forward search methods for outlier detection [J].
Calo, Daniela G. .
DATA ANALYSIS, MACHINE LEARNING AND APPLICATIONS, 2008, :103-+
[8]   Testing normality in the presence of outliers [J].
Coin, Daniele .
STATISTICAL METHODS AND APPLICATIONS, 2008, 17 (01) :3-12
[9]   Robust estimation in the normal mixture model based on robust clustering [J].
Cuesta-Albertos, J. A. ;
Matran, C. ;
Mayo-Iscar, A. .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :779-802
[10]  
Cuesta-Albertos JA, 1997, ANN STAT, V25, P553