Determination of minimum sample size and discriminatory expression patterns in microarray data

被引:92
作者
Hwang, DH [1 ]
Schmitt, WA [1 ]
Stephanopoulos, G [1 ]
Stephanopoulos, G [1 ]
机构
[1] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/18.9.1184
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Transcriptional profiling using microarrays can reveal important information about cellular and tissue expression phenotypes, but these measurements are costly and time consuming. Additionally, tissue sample availability poses further constraints on the number of arrays that can be analyzed in connection with a particular disease or state of interest. It is therefore important to provide a method for the determination of the minimum number of microarrays required to separate, with statistical reliability, distinct disease states or other physiological differences. Results: Power analysis was applied to estimate the minimum sample size required for two-class and multi-class discrimination. The power analysis algorithm calculates the appropriate sample size for discrimination of phenotypic subtypes in a reduced dimensional space obtained by Fisher discriminant analysis (FDA). This approach was tested by applying the algorithm to existing data sets for estimation of the minimum sample size required for drawing certain conclusions on multi-class distinction with statistical reliability. It was confirmed that when the minimum number of samples estimated from power analysis is used, group means in the FDA discrimination space are statistically different.
引用
收藏
页码:1184 / 1193
页数:10
相关论文
共 25 条
  • [1] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [2] Singular value decomposition for genome-wide expression data processing and modeling
    Alter, O
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) : 10101 - 10106
  • [3] [Anonymous], 1984, Multivariate Analysis
  • [4] Cohen J., 1998, Statistical Power Analysis for the Behavioral Sciences, V2nd
  • [5] DUDOIT S, 2000, IN PRESS J AM STAT A
  • [6] DUDOIT S, 2001, IN PRESS STAT SINICA
  • [7] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [8] Fundamental patterns underlying gene expression profiles: Simplicity from complexity
    Holter, NS
    Mitra, M
    Maritan, A
    Cieplak, M
    Banavar, JR
    Fedoroff, NV
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (15) : 8409 - 8414
  • [9] Johnson R. A., 1992, APPL MULTIVARIATE ST, V4
  • [10] Kraemer H.C., 1987, MANY SUBJECTS STAT P