A modified correlation coefficient based similarity measure for clustering time-course gene expression data

被引:24
作者
Son, Young Sook [1 ]
Baek, Jangsun [1 ]
机构
[1] Chonnam Natl Univ, Dept Stat, Kwangju 500757, South Korea
基金
新加坡国家研究基金会;
关键词
Pearson's correlation coefficient; Spearmann's correlation coefficient; modified correlation coefficient; similarity; clustering; time-course gene expression data;
D O I
10.1016/j.patrec.2007.09.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gene expression levels are often measured consecutively in time through microarray experiments to detect cellular processes underlying regulatory effects observed and to assign functionality to genes whose function is yet unknown. Clustering methods allow us to group genes that show similar time-course expression profiles and that are thus likely to be co-regulated. The correlation coefficient, the most well-liked similarity measure in the context of gene expression data, is not very reliable in representing the association of two temporal profile patterns. Moreover, the clustering methods with the correlation coefficient generate the same clustering result even when the time points are permuted arbitrarily. We propose a new similarity measure for clustering time-course gene expression data. The proposed measure is based on the correlation coefficient and the two indices representing the concordance of temporal profile patterns and that of the time points at which maximum and minimum expression levels are measured between two profiles, respectively. We applied the hierarchical clustering method with the proposed similarity measure to both synthetic and breast cancer cell line data. We observed favorable results compared to the correlation coefficient based method. The proposed similarity measure is simple to implement, and it is much more consistent for clustering than the correlation coefficient based method according to the cross-validation criterion. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:232 / 242
页数:11
相关论文
共 11 条
[1]   Clustering of gene expression data using a local shape-based similarity measure [J].
Balasubramaniyan, R ;
Hüllermeier, E ;
Weskamp, N ;
Kämper, J .
BIOINFORMATICS, 2005, 21 (07) :1069-1077
[2]   The transcriptional program of sporulation in budding yeast [J].
Chu, S ;
DeRisi, J ;
Eisen, M ;
Mulholland, J ;
Botstein, D ;
Brown, PO ;
Herskowitz, I .
SCIENCE, 1998, 282 (5389) :699-705
[3]   Comparisons and validation of statistical clustering techniques for microarray gene expression data [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2003, 19 (04) :459-466
[4]   Statistical analysis of a small set of time-ordered gene expression data using linear splines [J].
de Hoon, MJL ;
Imoto, S ;
Miyano, S .
BIOINFORMATICS, 2002, 18 (11) :1477-1485
[5]   Exploring expression data: Identification and analysis of coexpressed genes [J].
Heyer, LJ ;
Kruglyak, S ;
Yooseph, S .
GENOME RESEARCH, 1999, 9 (11) :1106-1115
[6]   CONFIDENCE-INTERVAL ESTIMATION SUBJECT TO ORDER RESTRICTIONS [J].
HWANG, JTG ;
DASPEDDADA, S .
ANNALS OF STATISTICS, 1994, 22 (01) :67-93
[7]   Regulation of DNA replication fork genes by 17β-estradiol [J].
Lobenhofer, EK ;
Bennett, L ;
Cable, PL ;
Li, LP ;
Bushel, PR ;
Afshari, CA .
MOLECULAR ENDOCRINOLOGY, 2002, 16 (06) :1215-1229
[8]   Clustering of time-course gene expression data using a mixed-effects model with B-splines [J].
Luan, YH ;
Li, HZ .
BIOINFORMATICS, 2003, 19 (04) :474-482
[9]   Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference [J].
Peddada, SD ;
Lobenhofer, EK ;
Li, LP ;
Afshari, CA ;
Weinberg, CR ;
Umbach, DM .
BIOINFORMATICS, 2003, 19 (07) :834-841