Investigation of a new GRASP-based clustering algorithm applied to biological data

被引:19
作者
Nascimento, Maria C. V. [1 ]
Toledo, Franklina M. B. [1 ]
de Carvalho, Andre C. P. L. F. [1 ]
机构
[1] Univ Sao Paulo, Inst Ciencias Matemat & Comp, BR-13560970 Sao Carlos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Clustering; GRASP; Gene expression data; Bioinformatics; GRAPH-THEORETIC APPROACH; MINIMUM SUM; K-MEANS; EXPRESSION;
D O I
10.1016/j.cor.2009.02.014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1381 / 1388
页数:8
相关论文
共 42 条
  • [1] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [2] [Anonymous], 2007, Uci machine learning repository
  • [3] CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs
    Ao, SI
    Yip, K
    Ng, M
    Cheung, D
    Fong, PY
    Melhado, I
    Sham, PC
    [J]. BIOINFORMATICS, 2005, 21 (08) : 1735 - 1736
  • [4] AREIBI S, 1997, DIMACS SERIES DISCRE, V35, P711
  • [5] An evolutionary technique based on K-Means algorithm for optimal clustering in RN
    Bandyopadhyay, S
    Maulik, U
    [J]. INFORMATION SCIENCES, 2002, 146 (1-4) : 221 - 237
  • [6] Bennett KP., 1992, OPTIMIZATION METHODS, V1, P23, DOI [DOI 10.1080/10556789208805504, 10.1080/10556789208805504]
  • [7] Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection
    Brito, MR
    Chavez, EL
    Quiroz, AJ
    Yukich, JE
    [J]. STATISTICS & PROBABILITY LETTERS, 1997, 35 (01) : 33 - 42
  • [8] *BROAD MIT I, CANC PROGR DAT SETS
  • [9] Cano JR, 2002, J INTELL FUZZY SYST, V12, P235
  • [10] Multi-class protein fold recognition using support vector machines and neural networks
    Ding, CHQ
    Dubchak, I
    [J]. BIOINFORMATICS, 2001, 17 (04) : 349 - 358