Concepts of relative sample outlier (RSO) and weighted sample similarity (WSS) for improving performance of clustering genes: co-function and co-regulation

被引:1
作者
Bhattacharya, Anindya [1 ]
Chowdhury, Nirmalya [2 ]
De, Rajat K. [3 ]
机构
[1] Univ Tennessee, Hlth Sci Ctr, Ctr Integrat & Translat Genom, Dept Microbiol Immunol & Biochem, Memphis, TN 38163 USA
[2] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata 700032, W Bengal, India
[3] Indian Stat Inst, Machine Intelligence Unit, Kolkata 700108, W Bengal, India
关键词
similarity measure; z-score; P-value; functional enrichment; transcription factors; EXPRESSION PROFILES; ALGORITHM; IDENTIFICATION; PATHWAYS; PATTERNS;
D O I
10.1504/IJDMB.2015.067322
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the success of a similarity measure. Better the ability of similarity measure in measuring similarity between genes in the presence of outliers, better will be the performance of the clustering algorithm in forming biologically relevant groups of genes. In the present article, we discuss the problem of handling outliers with different existing similarity measures and introduce the concepts of Relative Sample Outlier (RSO). We formulate new similarity, called Weighted Sample Similarity (WSS), incorporated in Euclidean distance and Pearson correlation coefficient and then use them in various clustering and biclustering algorithms to group different gene expression profiles. Our results suggest that WSS improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.
引用
收藏
页码:314 / 330
页数:17
相关论文
共 30 条
[1]  
[Anonymous], 1981, Introduction to Multidimensional Scaling: Theory, Methods and Applications
[2]  
[Anonymous], 1980, Identification of Outliers, DOI DOI 10.1007/978-94-015-3994-4
[3]  
[Anonymous], 1987, Robust regression and outlier detection
[4]   Correlation clustering [J].
Bansal, N ;
Blum, A ;
Chawla, S .
MACHINE LEARNING, 2004, 56 (1-3) :89-113
[5]  
Barnett V., 1994, Wiley series in probability and mathematical statistics
[6]   Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles [J].
Bhattacharya, Anindya ;
De, Rajat K. .
BIOINFORMATICS, 2008, 24 (11) :1359-1366
[7]   Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values [J].
Bhattacharya, Anindya ;
De, Rajat K. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (04) :560-568
[8]   Bi-correlation clustering algorithm for determining a set of co-regulated genes [J].
Bhattacharya, Anindya ;
De, Rajat K. .
BIOINFORMATICS, 2009, 25 (21) :2795-2801
[9]   CALCULATING CORRELATION-COEFFICIENTS WITH REPEATED OBSERVATIONS .2. CORRELATION BETWEEN SUBJECTS [J].
BLAND, JM ;
ALTMAN, DG .
BRITISH MEDICAL JOURNAL, 1995, 310 (6980) :633-633
[10]   Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process [J].
Chandran, Uma R. ;
Ma, Changqing ;
Dhir, Rajiv ;
Bisceglia, Michelle ;
Lyons-Weiler, Maureen ;
Liang, Wenjing ;
Michalopoulos, George ;
Becich, Michael ;
Monzon, Federico A. .
BMC CANCER, 2007, 7 (1)