Robust clustering

被引:14
作者
Banerjee, Amit [2 ]
Dave, Rajesh N. [1 ]
机构
[1] New Jersey Inst Technol, Dept Chem Biol & Pharmaceut Engn, Newark, NJ 07102 USA
[2] Penn State Univ Harrisburg, Sch Sci Engn & Technol, Middletown, PA USA
关键词
C-MEANS ALGORITHM; IMAGE SEGMENTATION; K-MEANS; INITIALIZATION; NOISE; AGGLOMERATION; OPTIMIZATION; CHAMELEON; OUTLIERS; NUMBER;
D O I
10.1002/widm.49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Historical and recent developments in the field of robust clustering and their applications are reviewed. The discussion focuses on different strategies that have been developed to reduce the sensitivity of clustering methods to outliers in data, while pointing out the importance of the need for both efficient partitioning and simultaneous robust model fitting. Although all clustering methods and algorithms have good partitioning capabilities when data are clean and free of outliers, they break down in the presence of outliers in the data. This is because classical development in the field of clustering has focused on such assumptions that data is free of noise and the data are well distributed, Robust model fitting, while retaining the partitioning power, involves the development of methods and algorithms that reject these classical assumptions either by explicitly incorporating robust statistical methods (often regression based) or by recasting the clustering problem in a way that does so implicitly. In this review, the robust model fitting aspect is identified in pertinent methodological and algorithmic advances and tied to related developments in robust statistics wherever possible. The paper also includes representative samples of various applications of robust clustering methods to both synthetic and real-world datasets. (C) 2011 Wiley Periodicals, Inc.
引用
收藏
页码:29 / 59
页数:31
相关论文
共 141 条
[1]   Automatic subspace clustering of high dimensional data [J].
Agrawal, R ;
Gehrke, J ;
Gunopulos, D ;
Raghavan, P .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (01) :5-33
[2]   A modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data [J].
Ahmed, MN ;
Yamany, SM ;
Mohamed, N ;
Farag, AA ;
Moriarty, T .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2002, 21 (03) :193-199
[3]   A roadmap of clustering algorithms: finding a match for a biomedical application [J].
Andreopoulos, Bill ;
An, Aijun ;
Wang, Xiaogang ;
Schroeder, Michael .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (03) :297-314
[4]  
Ankerst M., 1999, SIGMOD Record, V28, P49, DOI 10.1145/304181.304187
[5]  
[Anonymous], 2005, P 2 INT WORKSH TEXT
[6]  
[Anonymous], 2001, Pattern Classification
[7]  
[Anonymous], 1988, Algorithms for Clustering Data
[8]   Robust Bayesian clustering [J].
Archambeau, Cedric ;
Verleysen, Michel .
NEURAL NETWORKS, 2007, 20 (01) :129-138
[9]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[10]   A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA [J].
BALL, GH ;
HALL, DJ .
BEHAVIORAL SCIENCE, 1967, 12 (02) :153-&