Statistical development and evaluation of microarray gene expression data filters

被引:21
|
作者
Pounds, S [1 ]
Cheng, C [1 ]
机构
[1] St Jude Childrens Res Hosp, Dept Biostat, Memphis, TN 38105 USA
关键词
filter; microarray; pooled p-value; total error criterion; error minimization;
D O I
10.1089/cmb.2005.12.482
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Filtering is a common practice used to simplify the analysis of microarray data by removing from subsequent consideration probe sets believed to be unexpressed. The m/ n filter, which is widely used in the analysis of Affymetrix data, removes all probe sets having fewer than m present calls among a set of n chips. The m/ n filter has been widely used without considering its statistical properties. The level and power of the m/ n filter are derived. Two alternative filters, the pooled p- value filter and the error- minimizing pooled p- value filter are proposed. The pooled p- value filter combines information from the present - absent p- values into a single summary p- value which is subsequently compared to a selected significance threshold. We show that the pooled p- value filter is the uniformly most powerful statistical test under a reasonable beta model and that it exhibits greater power than the m/ n filter in all scenarios considered in a simulation study. The error- minimizing pooled p- value filter compares the summary p- value with a threshold determined to minimize a total- error criterion based on a partition of the distribution of all probes' summary p- values. The pooled p- value and error- minimizing pooled p- value filters clearly perform better than the m/ n filter in a case- study analysis. The case- study analysis also demonstrates a proposed method for estimating the number of differentially expressed probe sets excluded by filtering and subsequent impact on the final analysis. The filter impact analysis shows that the use of even the best filter may hinder, rather than enhance, the ability to discover interesting probe sets or genes. S- plus and R routines to implement the pooled p- value and error- minimizing pooled p- value filters have been developed and are available from www. stjuderesearch. org/ depts/ biostats/ index. html.
引用
收藏
页码:482 / 495
页数:14
相关论文
共 50 条
  • [2] Statistical Quality Control of Microarray Gene Expression Data
    Lu, Shen
    Segall, Richard S.
    WMSCI 2011: 15TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, 2011, : 206 - 211
  • [3] Some statistical issues in microarray gene expression data
    Mayo, Matthew S.
    Gajewski, Byron J.
    Morris, Jeffrey S.
    RADIATION RESEARCH, 2006, 165 (06) : 745 - 748
  • [4] Statistical design and the analysis of gene expression microarray data
    Kerr, MK
    Churchill, GA
    GENETICAL RESEARCH, 2001, 77 (02) : 123 - 128
  • [5] Microarray analysis of gene expression: considerations in data mining and statistical treatment
    Verducci, Joseph S.
    Melfi, Vincent F.
    Lin, Shili
    Wang, Zailong
    Roy, Sashwati
    Sen, Chandan K.
    PHYSIOLOGICAL GENOMICS, 2006, 25 (03) : 355 - 363
  • [6] Comparisons and validation of statistical clustering techniques for microarray gene expression data
    Datta, S
    Datta, S
    BIOINFORMATICS, 2003, 19 (04) : 459 - 466
  • [7] Development of a prostate cDNA microarray and statistical gene expression analysis package
    Carlisle, AJ
    Prabhu, VV
    Elkahloun, A
    Hudson, J
    Trent, JM
    Linehan, WM
    Williams, ED
    Emmert-Buck, MR
    Liotta, LA
    Munson, PJ
    Krizman, DB
    MOLECULAR CARCINOGENESIS, 2000, 28 (01) : 12 - 22
  • [8] Challenges of microarray data and the evaluation of gene expression profile signatures
    Simon, Richard
    CANCER INVESTIGATION, 2008, 26 (04) : 327 - 332
  • [9] Statistical analysis of microarray gene expression data from a mouse model of toxoplasmosis
    Pawar, Shrikant
    Davis, Cheryl D.
    Rinehart, Claire A.
    BMC BIOINFORMATICS, 2011, 12
  • [10] Statistical analysis of microarray gene expression data from a mouse model of toxoplasmosis
    Shrikant Pawar
    Cheryl D Davis
    Claire A Rinehart
    BMC Bioinformatics, 12