YuGene: A simple approach to scale gene expression data derived from different platforms for integrated analyses

被引:49
作者
Cao, Kim-Anh Le [1 ,2 ]
Rohart, Florian [3 ]
McHugh, Leo [1 ]
Korn, Othmar [3 ]
Wells, Christine A. [3 ,4 ]
机构
[1] Univ Queensland, Queensland Facil Adv Bioinformat, St Lucia, Qld 4072, Australia
[2] Univ Queensland, Inst Mol Biol, St Lucia, Qld 4072, Australia
[3] Univ Queensland, Australian Inst Bioengn & Nanotechnol, St Lucia, Qld 4072, Australia
[4] Univ Glasgow, Coll Med Vet & Life Sci, Inst Infect Immun & Inflammat, Glasgow G12 8TA, Lanark, Scotland
基金
澳大利亚研究理事会;
关键词
Gene expression; Cross platform normalization; Microarray; PLURIPOTENT STEM-CELLS; AFFYMETRIX; GENERATION; HYBRIDIZATION;
D O I
10.1016/j.ygeno.2014.03.001
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Gene expression databases contain invaluable information about a range of cell states, but the question "Where is my gene of interest expressed?" remains one of the most difficult to systematically assess when relevant data is derived on different platforms. Barriers to integrating this data include disparities in data formats and scale, a lack of common identifiers, and the disproportionate contribution of a platform to the 'batch effect'. There are few purpose-built cross-platform normalization strategies, and most of these fit data to an idealized data structure, which in turn may compromise gene expression comparisons between different platforms. YuGene addresses this gap by providing a simple transform that assigns a modified cumulative proportion value to each measurement, without losing essential underlying information on data distributions or experimental correlates. The Yugene transform is applied to individual samples and is suitable to apply to data with different distributions. Yugene is robust to combining datasets of different sizes, does not require global renormalization as new data is added, and does not require a common identifier. YuGene was benchmarked against commonly used normalization approaches, performing favorably in comparison to quantile (RMA), Z-score or rank methods. Implementation in the www.stemformatics.org resource provides users with expression queries across stem cell related datasets. Probe performance statistics including poorly performing (never expressed) probes, and examples of probes/genes expressed in a sample-restricted manner are provided. The YuGene software is implemented as an R package available from CRAN. (C) 2014 Published by Elsevier Inc.
引用
收藏
页码:239 / 251
页数:13
相关论文
共 53 条
[1]   Probe mapping across multiple microarray platforms [J].
Allen, Jeffrey D. ;
Wang, Siling ;
Chen, Min ;
Girard, Luc ;
Minna, John D. ;
Xie, Yang ;
Xiao, Guanghua .
BRIEFINGS IN BIOINFORMATICS, 2012, 13 (05) :547-554
[2]   NCBI GEO: archive for functional genomics data sets-10 years on [J].
Barrett, Tanya ;
Troup, Dennis B. ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Muertter, Rolf N. ;
Holko, Michelle ;
Ayanbule, Oluwabukunmi ;
Yefanov, Andrey ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D1005-D1010
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   Reference Maps of Human ES and iPS Cell Variation Enable High-Throughput Characterization of Pluripotent Cell Lines [J].
Bock, Christoph ;
Kiskinis, Evangelos ;
Verstappen, Griet ;
Gu, Hongcang ;
Boulting, Gabriella ;
Smith, Zachary D. ;
Ziller, Michael ;
Croft, Gist F. ;
Amoroso, Mackenzie W. ;
Oakley, Derek H. ;
Gnirke, Andreas ;
Eggan, Kevin ;
Meissner, Alexander .
CELL, 2011, 144 (03) :439-452
[5]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[6]  
BOLSTAD BM, PREPROCESSCORE COLLE
[7]   Gene expression anti-profiles as a basis for accurate universal cancer signatures [J].
Bravo, Hector Corrada ;
Pihur, Vasyl ;
McCall, Matthew ;
Irizarry, Rafael A. ;
Leek, Jeffrey T. .
BMC BIOINFORMATICS, 2012, 13
[8]   Modelling schizophrenia using human induced pluripotent stem cells [J].
Brennand, Kristen J. ;
Simone, Anthony ;
Jou, Jessica ;
Gelboin-Burkhart, Chelsea ;
Tran, Ngoc ;
Sangar, Sarah ;
Li, Yan ;
Mu, Yangling ;
Chen, Gong ;
Yu, Diana ;
McCarthy, Shane ;
Sebat, Jonathan ;
Gage, Fred H. .
NATURE, 2011, 473 (7346) :221-+
[9]   A framework for oligonucleotide microarray preprocessing [J].
Carvalho, Benilton S. ;
Irizarry, Rafael A. .
BIOINFORMATICS, 2010, 26 (19) :2363-2367
[10]   Analysis of microarray data using Z score transformation [J].
Cheadle, C ;
Vawter, MP ;
Freed, WJ ;
Becker, KG .
JOURNAL OF MOLECULAR DIAGNOSTICS, 2003, 5 (02) :73-81