clValid: An R package for cluster validation

被引:520
作者
Brock, Guy [1 ]
Datta, Susmita [1 ]
Pihur, Vasyl [1 ]
Datta, Somnath [1 ]
机构
[1] Univ Louisville, Sch Publ Hlth & Informat Sci, Dept Bioinformat & Biostat, Louisville, KY 40292 USA
基金
美国国家科学基金会;
关键词
clustering; validation; R package; stability measures; biological annotation;
D O I
10.18637/jss.v025.i04
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchial, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneouly evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.
引用
收藏
页码:1 / 22
页数:22
相关论文
共 40 条
[1]   FatiGO:: a web tool for finding significant associations of Gene Ontology terms with groups of genes [J].
Al-Shahrour, F ;
Díaz-Uriarte, R ;
Dopazo, J .
BIOINFORMATICS, 2004, 20 (04) :578-580
[2]  
[Anonymous], J AM STAT ASS
[3]  
[Anonymous], SELF ORGANIZING MAPS
[4]   Neural crest and mesoderm lineage-dependent gene expression in orofacial development [J].
Bhattacherjee, Vasker ;
Mukhopadhyay, Partha ;
Singh, Saurabh ;
Johnson, Charles ;
Philipose, John T. ;
Warner, Courtney P. ;
Greene, Robert M. ;
Pisano, M. Michele .
DIFFERENTIATION, 2007, 75 (05) :463-477
[5]   A knowledge-driven approach to cluster validity assessment [J].
Bolshakova, N ;
Azuaje, F ;
Cunningham, P .
BIOINFORMATICS, 2005, 21 (10) :2546-2547
[6]   The transcriptional program of sporulation in budding yeast [J].
Chu, S ;
DeRisi, J ;
Eisen, M ;
Mulholland, J ;
Botstein, D ;
Brown, PO ;
Herskowitz, I .
SCIENCE, 1998, 282 (5389) :699-705
[7]   Comparisons and validation of statistical clustering techniques for microarray gene expression data [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2003, 19 (04) :459-466
[8]   Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes [J].
Datta, Susmita ;
Datta, Somnath .
BMC BIOINFORMATICS, 2006, 7 (1)
[9]   Fuzzy C-means method for clustering microarray data [J].
Dembélé, D ;
Kastner, P .
BIOINFORMATICS, 2003, 19 (08) :973-980
[10]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686