GFS: fuzzy preprocessing for effective gene expression analysis

被引:21
作者
Belorkar, Abha [1 ]
Wong, Limsoon [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, 13 Comp Dr, Singapore 117417, Singapore
关键词
Gene expression analysis; Fuzzy scoring; Preprocessing; Normalization; LEUKEMIA; PREDICTION; MUSCLE; DMD;
D O I
10.1186/s12859-016-1327-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene expression data produced on high-throughput platforms such as microarrays is susceptible to much variation that obscures useful biological information. Therefore, preprocessing data with a suitable normalization method is necessary, and has a direct and massive impact on the quality of downstream data analysis. However, it is known that standard normalization methods perform poorly, specially in the presence of substantial batch effects and heterogeneity in gene expression data. Results: We present Gene Fuzzy Score (GFS), a simple preprocessing technique, that is able to largely reduce obscuring variation while retaining useful biological information. Using four sets of publicly available datasets containing batch effects and heterogeneity, we compare GFS with three standard normalization techniques as well as raw gene expression. Each method is evaluated with respect to the quality, consistency, and biological coherence of its processed output. It is found that GFS outperforms other transformation techniques in all three aspects. Conclusion: Our approach to preprocessing is a stronger alternative to popular normalization techniques. We demonstrate that it achieves the essential goal of preprocessing - it is effective at making expression values from multiple samples comparable, even when they are from separate platforms, in independent batches, or belong to a heterogeneous phenotype.
引用
收藏
页数:16
相关论文
共 17 条
[1]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[2]   Quantitative proteomics signature profiling based on network contextualization [J].
Bin Goh, Wilson Wen ;
Guo, Tiannan ;
Aebersold, Ruedi ;
Wong, Limsoon .
BIOLOGY DIRECT, 2015, 10
[3]   Analysis of microarray data using Z score transformation [J].
Cheadle, C ;
Vawter, MP ;
Freed, WJ ;
Becker, KG .
JOURNAL OF MOLECULAR DIAGNOSTICS, 2003, 5 (02) :73-81
[4]   From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems [J].
Geistlinger, Ludwig ;
Csaba, Gergely ;
Kueffner, Robert ;
Mulder, Nicola ;
Zimmer, Ralf .
BIOINFORMATICS, 2011, 27 (13) :I366-I373
[5]   Evaluating feature-selection stability in next-generation proteomics [J].
Goh, Wilson Wen Bin ;
Wong, Limsoon .
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2016, 14 (05)
[6]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[7]   Gene expression comparison of biopsies from Duchenne muscular dystrophy (DMD) and normal skeletal muscle [J].
Haslett, JN ;
Sanoudou, D ;
Kho, AT ;
Bennett, RR ;
Greenberg, SA ;
Kohane, IS ;
Beggs, AH ;
Kunkel, LM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (23) :15000-15005
[8]   Tackling the widespread and critical impact of batch effects in high-throughput data [J].
Leek, Jeffrey T. ;
Scharpf, Robert B. ;
Bravo, Hector Corrada ;
Simcha, David ;
Langmead, Benjamin ;
Johnson, W. Evan ;
Geman, Donald ;
Baggerly, Keith ;
Irizarry, Rafael A. .
NATURE REVIEWS GENETICS, 2010, 11 (10) :733-739
[9]   Finding consistent disease subnetworks using PFSNet [J].
Lim, Kevin ;
Wong, Limsoon .
BIOINFORMATICS, 2014, 30 (02) :189-196
[10]   A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data [J].
Luo, J. ;
Schumacher, M. ;
Scherer, A. ;
Sanoudou, D. ;
Megherbi, D. ;
Davison, T. ;
Shi, T. ;
Tong, W. ;
Shi, L. ;
Hong, H. ;
Zhao, C. ;
Elloumi, F. ;
Shi, W. ;
Thomas, R. ;
Lin, S. ;
Tillinghast, G. ;
Liu, G. ;
Zhou, Y. ;
Herman, D. ;
Li, Y. ;
Deng, Y. ;
Fang, H. ;
Bushel, P. ;
Woods, M. ;
Zhang, J. .
PHARMACOGENOMICS JOURNAL, 2010, 10 (04) :278-291