Statistical inference for simultaneous clustering of gene expression data

被引:31
作者
Pollard, KS [1 ]
van der Laan, MJ [1 ]
机构
[1] Univ Calif Berkeley, Sch Publ Hlth, Dept Biostat, Berkeley, CA 94720 USA
关键词
clustering; parameter; bootstrap; gene expression; drug discovery;
D O I
10.1016/S0025-5564(01)00116-X
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function theta = Phi(P) of the true data generating distribution P, and an estimate is obtained by applying this function to the empirical distribution P-n. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as parameters which are compositions of individual mappings for clustering patients and genes. This framework allows one to assess classical properties of clustering methods, such as consistency, and to formally study statistical inference regarding the clustering parameter. We present results of simulations designed to assess the asymptotic validity of different bootstrap methods for estimating the distribution of Phi(P-n). The method is illustrated on a publicly available data set. (C) 2002 Published by Elsevier Science Inc.
引用
收藏
页码:99 / 121
页数:23
相关论文
共 19 条
  • [1] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [2] Molecular classification of cutaneous malignant melanoma by gene expression profiling
    Bittner, M
    Meitzer, P
    Chen, Y
    Jiang, Y
    Seftor, E
    Hendrix, M
    Radmacher, M
    Simon, R
    Yakhini, Z
    Ben-Dor, A
    Sampas, N
    Dougherty, E
    Wang, E
    Marincola, F
    Gooden, C
    Lueders, J
    Glatfelter, A
    Pollock, P
    Carpten, J
    Gillanders, E
    Leja, D
    Dietrich, K
    Beaudry, C
    Berens, M
    Alberts, D
    Sondak, V
    Hayward, N
    Trent, J
    [J]. NATURE, 2000, 406 (6795) : 536 - 540
  • [3] BREIMAN L, 1998, 513 UC BERK DEP STAT
  • [4] DNA microarrays in drug discovery and development
    Debouck, C
    Goodfellow, PN
    [J]. NATURE GENETICS, 1999, 21 (Suppl 1) : 48 - 50
  • [5] DeRisi J, 1996, NAT GENET, V14, P457
  • [6] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [7] FRALEY C, 2000, 380 U WASH DEP STAT
  • [8] BOOTSTRAPPING GENERAL EMPIRICAL MEASURES
    GINE, E
    ZINN, J
    [J]. ANNALS OF PROBABILITY, 1990, 18 (02) : 851 - 869
  • [9] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [10] Functional discovery via a compendium of expression profiles
    Hughes, TR
    Marton, MJ
    Jones, AR
    Roberts, CJ
    Stoughton, R
    Armour, CD
    Bennett, HA
    Coffey, E
    Dai, HY
    He, YDD
    Kidd, MJ
    King, AM
    Meyer, MR
    Slade, D
    Lum, PY
    Stepaniants, SB
    Shoemaker, DD
    Gachotte, D
    Chakraburtty, K
    Simon, J
    Bard, M
    Friend, SH
    [J]. CELL, 2000, 102 (01) : 109 - 126