virtualArray: a R/bioconductor package to merge raw data from different microarray platforms

被引:44
作者
Heider, Andreas [1 ]
Alt, Ruediger [1 ]
机构
[1] Univ Leipzig, Translat Ctr Regenerat Med Leipzig, D-04103 Leipzig, Germany
来源
BMC BIOINFORMATICS | 2013年 / 14卷
关键词
GENE-EXPRESSION OMNIBUS; OLIGONUCLEOTIDE ARRAYS; DIFFERENT GENERATIONS; NORMALIZATION METHODS;
D O I
10.1186/1471-2105-14-75
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Microarrays have become a routine tool to address diverse biological questions. Therefore, different types and generations of microarrays have been produced by several manufacturers over time. Likewise, the diversity of raw data deposited in public databases such as NCBI GEO or EBI ArrayExpress has grown enormously. This has resulted in databases currently containing several hundred thousand microarray samples clustered by different species, manufacturers and chip generations. While one of the original goals of these databases was to make the data available to other researchers for independent analysis and, where appropriate, integration with their own data, current software implementations could not provide that feature. Only those data sets generated on the same chip platform can be readily combined and even here there are batch effects to be taken care of. A straightforward approach to deal with multiple chip types and batch effects has been missing. The software presented here was designed to solve both of these problems in a convenient and user friendly way. Results: The virtualArray software package can combine raw data sets using almost any chip types based on current annotations from NCBI GEO or Bioconductor. After establishing congruent annotations for the raw data, virtualArray can then directly employ one of seven implemented methods to adjust for batch effects in the data resulting from differences between the chip types used. Both steps can be tuned to the preferences of the user. When the run is finished, the whole dataset is presented as a conventional Bioconductor "ExpressionSet" object, which can be used as input to other Bioconductor packages. Conclusions: Using this software package, researchers can easily integrate their own microarray data with data from public repositories or other sources that are based on different microarray chip types. Using the default approach a robust and up-to-date batch effect correction technique is applied to the data.
引用
收藏
页数:10
相关论文
共 40 条
[1]   Microarray data analysis: from disarray to consolidation and consensus [J].
Allison, DB ;
Cui, XQ ;
Page, GP ;
Sabripour, M .
NATURE REVIEWS GENETICS, 2006, 7 (01) :55-65
[2]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[3]  
AnnotationForge, COD BUILD ANN DAT PA
[4]  
BiocParallel, BIOC FAC PAR EV
[5]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[6]   Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization [J].
Cahan, Patrick ;
Rovegno, Felicia ;
Mooney, Denise ;
Newman, John C. ;
St. Laurent, Georges, III ;
McCaffrey, Timothy A. .
GENE, 2007, 401 (1-2) :12-18
[7]   Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods [J].
Chen, Chao ;
Grennan, Kay ;
Badner, Judith ;
Zhang, Dandan ;
Gershon, Elliot ;
Jin, Li ;
Liu, Chunyu .
PLOS ONE, 2011, 6 (02)
[8]   Very Small Embryonic-Like Stem Cells Purified from Umbilical Cord Blood Lack Stem Cell Characteristics [J].
Danova-Alt, Ralitza ;
Heider, Andreas ;
Egger, Dietmar ;
Cross, Michael ;
Alt, Ruediger .
PLOS ONE, 2012, 7 (04)
[9]   lumi:: a pipeline for processing Illumina microarray [J].
Du, Pan ;
Kibbe, Warren A. ;
Lin, Simon M. .
BIOINFORMATICS, 2008, 24 (13) :1547-1548
[10]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210