Multi-objective genetic algorithm based clustering approach and its application to gene expression data

被引:0
作者
Özyer, T [1 ]
Liu, YM [1 ]
Alhajj, R [1 ]
Barker, K [1 ]
机构
[1] Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
来源
ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS | 2004年 / 3261卷
关键词
multi-objective genetic algorithm; clustering; validity analysis; gene expression data analysis;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gene clustering is a common methodology for analyzing similar data based on expression trajectories. Clustering algorithms in general need the number of clusters as a priori, and this is mostly hard to estimate, even by domain experts. In this paper, we use Niched Pareto k-means Genetic Algorithm (GA) for clustering m-RNA data. After running the multi-objective GA, we get the pareto-optimal front that gives alternatives for the optimal number of clusters as a solution set. We analyze the clustering results under two cluster validity techniques commonly cited in the literature, namely DB index and SD index. This gives an idea about ranking the optimal numbers of clusters for each validity index. We tested the proposed clustering approach by conducting experiments using three data sets, namely figure2data, cancer (NC160) and Leukaemia data. The obtained results are promising; they demonstrate the applicability and effectiveness of the proposed approach.
引用
收藏
页码:451 / 461
页数:11
相关论文
共 27 条
[1]  
[Anonymous], 1998, PATTERN RECOGNITION
[2]  
BARASH Y, 2001, P 5 ANN INT C RES CO, P12
[3]  
BENDOR R, 1999, J COMPUTATIONAL BIOL
[4]   CLUSTER SEPARATION MEASURE [J].
DAVIES, DL ;
BOULDIN, DW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227
[5]  
DEB K, 2000, SPRINGER LNCS, V1917
[6]  
Dunn J.C., 1974, J CYBERNETICS, V3, P95, DOI [DOI 10.1080/01969727408546059, 10.1080/019697274085460590304.68093]
[7]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[8]   Techniques of cluster algorithms in data mining [J].
Grabmeier, J ;
Rudolph, A .
DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (04) :303-360
[9]  
Halkidi M, 2000, LECT NOTES COMPUT<D>, V1910, P265
[10]  
HALKIDI M, 2001, P IEEE ICDM CAL NOV