Systematic Evaluation of Gene Expression Data Analysis Methods Using Benchmark Data

被引:0
作者
Yang, Henry [1 ]
机构
[1] Natl Univ Singapore, Canc Sci Inst Singapore, 14 Med Dr 12-01, Singapore 117599, Singapore
来源
10TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS | 2016年 / 477卷
关键词
Differential expression; Statistical method evaluation; Spike-in validation; STATISTICAL-ANALYSIS; MICROARRAY DATA; NORMALIZATION;
D O I
10.1007/978-3-319-40126-3_10
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Due to limited amount of experimental validation datasets, data analysis methods for identifying differential expression based on high-throughput expression profiling technologies such as microarray and RNA-seq cannot be statistically validated properly, and thus guidelines for selecting an appropriate method are lacking. We applied mRNA spike-in approaches to develop a comprehensive set of experimental benchmark data and used it to evaluate various methods for identification of differential expression. Our results show that using the median log ratio to identify differential expression is superior to more complex and popular methods such as modified t-statistics. The median log ratio method is robust that a reasonably high accuracy of identification of differentially expressed genes can be achieved even for data with a small number of replicates and strong experimental variation between replicates. Machine learning for classification of differential expression based on the benchmark dataset indicates the existence of even more accurate methods for identification of differential expression. With this dataset, it can be also demonstrated that the methods prediction of false discovery rate based on a small number of replicates could be very inaccurate.
引用
收藏
页码:91 / 98
页数:8
相关论文
共 9 条
[1]   Statistical methods for ranking differentially expressed genes [J].
Broberg, P .
GENOME BIOLOGY, 2003, 4 (06)
[2]   A novel normalization method for effective removal of systematic variation in microarray data [J].
Chua, SW ;
Vijayakumar, P ;
Nissom, PM ;
Yam, CY ;
Wong, VVT ;
Yang, H .
NUCLEIC ACIDS RESEARCH, 2006, 34 (05)
[3]   Variance-modeled posterior inference of microarray data: detecting gene-expression changes in 3T3-L1 adipocytes [J].
Hsiao, A ;
Worrall, DS ;
Olefsky, JM ;
Subramaniam, S .
BIOINFORMATICS, 2004, 20 (17) :3108-3127
[4]  
Mazurek U, 2011, J BIOL REG HOMEOS AG, V25, P279
[5]   Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data [J].
Rapaport, Franck ;
Khanin, Raya ;
Liang, Yupu ;
Pirun, Mono ;
Krek, Azra ;
Zumbo, Paul ;
Mason, Christopher E. ;
Socci, Nicholas D. ;
Betel, Doron .
GENOME BIOLOGY, 2013, 14 (09)
[6]   DNA array-based transcriptional analysis of asporogenous, nonsolventogenic Clostridium acetobutylicum strains SKO1 and M5 [J].
Tomas, CA ;
Alsaker, KV ;
Bonarius, HPJ ;
Hendriksen, WT ;
Yang, H ;
Beamish, JA ;
Paredes, CJ ;
Papoutsakis, ET .
JOURNAL OF BACTERIOLOGY, 2003, 185 (15) :4539-4547
[7]   Significance analysis of microarrays applied to the ionizing radiation response [J].
Tusher, VG ;
Tibshirani, R ;
Chu, G .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (09) :5116-5121
[8]   Statistical analysis of differential gene expression relative to a fold change threshold on NanoString data of mouse odorant receptor genes [J].
Vaes, Evelien ;
Khan, Mona ;
Mombaerts, Peter .
BMC BIOINFORMATICS, 2014, 15
[9]   A segmental nearest neighbor normalization and gene identification method gives superior results for DNA-array analysis [J].
Yang, H ;
Haddad, H ;
Tomas, C ;
Alsaker, K ;
Papoutsakis, ET .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (03) :1122-1127