SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements

被引:16
作者
Cabanski, Christopher R. [1 ]
Qi, Yuan [2 ]
Yin, Xiaoying [2 ,3 ]
Bair, Eric [4 ,5 ]
Hayward, Michele C. [2 ]
Fan, Cheng [2 ]
Li, Jianying [2 ]
Wilkerson, Matthew D. [2 ]
Marron, J. S. [1 ,2 ]
Perou, Charles M. [2 ,6 ,7 ]
Hayes, D. Neil [2 ,8 ]
机构
[1] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC 27515 USA
[2] Univ N Carolina, Lineberger Comprehens Canc Ctr, Chapel Hill, NC 27599 USA
[3] Univ N Carolina, Dept Otolaryngol Head & Neck Surg, Chapel Hill, NC USA
[4] Univ N Carolina, Sch Dent, Chapel Hill, NC USA
[5] Univ N Carolina, Dept Biostat, Chapel Hill, NC USA
[6] Univ N Carolina, Dept Genet, Chapel Hill, NC USA
[7] Univ N Carolina, Dept Pathol & Lab Med, Chapel Hill, NC USA
[8] Univ N Carolina, Dept Internal Med, Div Med Oncol, Chapel Hill, NC USA
基金
美国国家卫生研究院;
关键词
MICROARRAY DATA-ANALYSIS; GENE-EXPRESSION; ARRAY DATA; EXPERIMENTAL-DESIGN; NORMALIZATION; REPRODUCIBILITY; QUALITY; RNA; CLASSIFICATION; MODELS;
D O I
10.1371/journal.pone.0009905
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray.
引用
收藏
页数:13
相关论文
共 47 条
[1]  
*AFF TECHN NOT, 2010, GUID PROB LOG INT ER
[2]  
[Anonymous], 2007, CURR PROTOC MOL BIOL, DOI DOI 10.1002/0471142727.MB1906S77
[3]  
Armstrong NJ, 2004, CELL ONCOL, V26, P279
[4]  
Bilban Martin, 2002, Current Issues in Molecular Biology, V4, P57
[5]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[6]  
Calinski T., 1974, Communications in Statistics-theory and Methods, V3, P1, DOI [10.1080/03610927408827101, DOI 10.1080/03610927408827101]
[7]   Fundamentals of experimental design for cDNA microarrays [J].
Churchill, GA .
NATURE GENETICS, 2002, 32 (Suppl 4) :490-495
[8]   Statistical analysis of DNA Microarray data in cancer research [J].
Fan, Jianqing ;
Ren, Yi .
CLINICAL CANCER RESEARCH, 2006, 12 (15) :4469-4473
[9]   Effects of atmospheric ozone on microarray data quality [J].
Fare, TL ;
Coffey, EM ;
Dai, HY ;
He, YDD ;
Kessler, DA ;
Kilian, KA ;
Koch, JE ;
LeProust, E ;
Marton, MJ ;
Meyer, MR ;
Stoughton, RB ;
Tokiwa, GY ;
Wang, YQ .
ANALYTICAL CHEMISTRY, 2003, 75 (17) :4672-4675
[10]   Evaluation of time profile reconstruction from complex two-color microarray designs [J].
Fierro, Ana C. ;
Thuret, Raphael ;
Engelen, Kristof ;
Bernot, Gilles ;
Marchal, Kathleen ;
Pollet, Nicolas .
BMC BIOINFORMATICS, 2008, 9 (1)