Shrinkage-based similarity metric for cluster analysis of microarray data

被引:24
|
作者
Cherepinsky, V
Feng, JW
Rejali, M
Mishra, B
机构
[1] NYU, Courant Inst Math Sci, New York, NY 10012 USA
[2] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
关键词
D O I
10.1073/pnas.1633770100
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The current standard correlation coefficient used in the analysis of microarray data was introduced by M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein [(1998) Proc. Nati. Acad Sci. USA 95, 1486314868]. Its formulation is rather arbitrary. We give a mathematically rigorous correlation coefficient of two data vectors based on James-Stein shrinkage estimators. We use the assumptions described by Eisen et al., also using the fact that the data can be treated as transformed into normal distributions. While Eisen et A use zero as an estimator for the expression vector mean mu, we start with the assumption that for each gene, IL is itself a zero-mean normal random variable [with a priori distribution N(0, tau(2))], and use Bayesian analysis to obtain a posteriori distribution of mu in terms of the data. The shrunk estimator for mu differs from the mean of the data vectors and ultimately leads to a statistically robust estimator for correlation coefficients. To evaluate the effectiveness of shrinkage, we conducted in silico experiments and also compared similarity metrics on a biological example by using the data set from Eisen et A For the latter, we classified genes involved in the regulation of yeast cell-cycle functions by computing clusters based on various definitions of correlation coefficients and contrasting them against clusters based on the activators known in the literature. The estimated false positives and false negatives from this study indicate that using the shrinkage metric improves the accuracy of the analysis.
引用
收藏
页码:9668 / 9673
页数:6
相关论文
共 50 条
  • [41] A shrinkage-based comparative assessment of observed-to-expected disproportionality measures
    Gipson, Geoffrey
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2012, 21 (06) : 589 - 596
  • [42] A Review of Cluster Analysis for Time Course Microarray Data
    Sohn, In Suk
    Lee, Jae Won
    Kim, Seo Young
    KOREAN JOURNAL OF APPLIED STATISTICS, 2006, 19 (01) : 13 - 32
  • [43] Tracking With Sparse and Correlated Measurements via a Shrinkage-Based Particle Filter
    Kiring, Aroland
    Salman, Naveed
    Liu, Chao
    Esnaola, Inaki
    Mihaylova, Lyudmila
    IEEE SENSORS JOURNAL, 2017, 17 (10) : 3152 - 3164
  • [44] Computational enhancement of a shrinkage-based analysis of variance F-test proposed for differential gene expression analysis
    Pounds, Stanley B.
    BIOSTATISTICS, 2007, 8 (03) : 505 - 506
  • [45] Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number
    Cheung, Yiu-ming
    Jia, Hong
    PATTERN RECOGNITION, 2013, 46 (08) : 2228 - 2238
  • [46] Mutual Information Based Extrinsic Similarity for Microarray Analysis
    Ucar, Duygu
    Altiparmak, Fatih
    Ferhatosmanoglu, Hakan
    Parthasarathy, Srinivasan
    BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5462 : 424 - +
  • [47] The Standard Deviation Score: a novel similarity metric for data analysis
    Ismael, Osama
    JOURNAL OF BIG DATA, 2025, 12 (01)
  • [48] Shrinkage Covariance Matrix Approach for Microarray Data
    Karjanto, Suryaefiza
    Aripin, Rasimah
    PROCEEDINGS OF THE 20TH NATIONAL SYMPOSIUM ON MATHEMATICAL SCIENCES (SKSM20): RESEARCH IN MATHEMATICAL SCIENCES: A CATALYST FOR CREATIVITY AND INNOVATION, PTS A AND B, 2013, 1522 : 1262 - 1268
  • [49] Source Enumeration for Large Array Using Shrinkage-Based Detectors With Small Samples
    Huang, Lei
    Qian, Cheng
    So, Hing Cheung
    Fang, Jun
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2015, 51 (01) : 344 - 357
  • [50] A SIMILARITY MEASURE FOR CHEMICAL DATA: APPLICATIONS TO CLUSTER ANALYSIS
    Kolossvary, Istvan
    Wegscheider, Wolfhard
    JOURNAL OF CHEMOMETRICS, 1990, 4 (03) : 255 - 266