Statistical inference for simultaneous clustering of gene expression data

被引:31
作者
Pollard, KS [1 ]
van der Laan, MJ [1 ]
机构
[1] Univ Calif Berkeley, Sch Publ Hlth, Dept Biostat, Berkeley, CA 94720 USA
关键词
clustering; parameter; bootstrap; gene expression; drug discovery;
D O I
10.1016/S0025-5564(01)00116-X
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function theta = Phi(P) of the true data generating distribution P, and an estimate is obtained by applying this function to the empirical distribution P-n. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as parameters which are compositions of individual mappings for clustering patients and genes. This framework allows one to assess classical properties of clustering methods, such as consistency, and to formally study statistical inference regarding the clustering parameter. We present results of simulations designed to assess the asymptotic validity of different bootstrap methods for estimating the distribution of Phi(P-n). The method is illustrated on a publicly available data set. (C) 2002 Published by Elsevier Science Inc.
引用
收藏
页码:99 / 121
页数:23
相关论文
共 19 条
  • [11] Kaufman L, 1990, INTRO CLUSTER ANAL
  • [12] Lillie J, 1997, DRUG DEVELOP RES, V41, P160, DOI 10.1002/(SICI)1098-2299(199707/08)41:3/4<160::AID-DDR6>3.0.CO
  • [13] 2-J
  • [14] Genomics, gene expression and DNA arrays
    Lockhart, DJ
    Winzeler, EA
    [J]. NATURE, 2000, 405 (6788) : 827 - 836
  • [15] Drug target validation and identification of secondary drug target effects using DNA microarrays
    Marton, MJ
    DeRisi, JL
    Bennett, HA
    Iyer, VR
    Meyer, MR
    Roberts, CJ
    Stoughton, R
    Burchard, J
    Slade, D
    Dai, HY
    Bassett, DE
    Hartwell, LH
    Brown, PO
    Friend, SH
    [J]. NATURE MEDICINE, 1998, 4 (11) : 1293 - 1301
  • [16] Distinctive gene expression patterns in human mammary epithelial cells and breast cancers
    Perou, CM
    Jeffrey, SS
    Van de Rijn, M
    Rees, CA
    Eisen, MB
    Ross, DT
    Pergamenschikov, A
    Williams, CF
    Zhu, SX
    Lee, JCF
    Lashkari, D
    Shalon, D
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (16) : 9212 - 9217
  • [17] Systematic variation in gene expression patterns in human cancer cell lines
    Ross, DT
    Scherf, U
    Eisen, MB
    Perou, CM
    Rees, C
    Spellman, P
    Iyer, V
    Jeffrey, SS
    Van de Rijn, M
    Waltham, M
    Pergamenschikov, A
    Lee, JCE
    Lashkari, D
    Shalon, D
    Myers, TG
    Weinstein, JN
    Botstein, D
    Brown, PO
    [J]. NATURE GENETICS, 2000, 24 (03) : 227 - 235
  • [18] Tibshirani R., 1999, CLUSTERING METHODS A
  • [19] VANDERLAAN MJ, 2001, BIOSTATISTICS, V2, P1