Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages

被引:180
作者
Taminau, Jonatan [1 ]
Meganck, Stijn [1 ]
Lazar, Cosmin [1 ]
Steenhoff, David [1 ]
Coletta, Alain [2 ]
Molter, Colin [2 ]
Duque, Robin [2 ]
de Schaetzen, Virginie [1 ]
Solis, David Y. Weiss [2 ]
Bersini, Hugues [2 ]
Nowe, Ann [1 ]
机构
[1] Vrije Univ Brussel, AI CoMo, B-1050 Brussels, Belgium
[2] Univ Libre Bruxelles, IRIDIA, B-1050 Brussels, Belgium
关键词
Batch effect removal; Data integration; Gene expression; Microarray repositories; InSilico DB; Reproducibility; GENE-EXPRESSION OMNIBUS; COMPARABILITY; VARIABILITY;
D O I
10.1186/1471-2105-13-335
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: With an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck. Results: We present the newly released inSilicoMerging R/Bioconductor package which, together with the earlier released inSilicoDb R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the inSilicoMerging package a set of five visual and six quantitative validation measures are available as well. Conclusions: By providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/].
引用
收藏
页数:8
相关论文
共 27 条
[1]   Sources of variability and effect of experimental approach on expression profiling data interpretation [J].
Bakay, M ;
Chen, YW ;
Borup, R ;
Zhao, P ;
Nagaraju, K ;
Hoffman, EP .
BMC BIOINFORMATICS, 2002, 3 (1)
[2]   NCBI GEO: archive for functional genomics data sets-10 years on [J].
Barrett, Tanya ;
Troup, Dennis B. ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Muertter, Rolf N. ;
Holko, Michelle ;
Ayanbule, Oluwabukunmi ;
Yefanov, Andrey ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D1005-D1010
[3]   Adjustment of systematic microarray data biases [J].
Benito, M ;
Parker, J ;
Du, Q ;
Wu, JY ;
Xang, D ;
Perou, CM ;
Marron, JS .
BIOINFORMATICS, 2004, 20 (01) :105-114
[4]  
Breit S, 2004, BRIT J HAEMATOLOGY, V126
[5]  
Brettschneider J, 2008, TECHNOMETRICS, V50, P241, DOI 10.1198/004017008000000334
[6]   Quantification of sources of variation and accuracy of sequence discrimination in a replicated microarray experiment [J].
Brown, SJ ;
Kuhn, D ;
Wisser, R ;
Power, E ;
Schnell, R .
BIOTECHNIQUES, 2004, 36 (02) :324-332
[7]   Cross-site comparison of gene expression data reveals high similarity [J].
Chu, TM ;
Deng, SB ;
Wolfinger, R ;
Paules, RS ;
Hamadeh, HK .
ENVIRONMENTAL HEALTH PERSPECTIVES, 2004, 112 (04) :449-455
[8]   InSilico DB genomic datasets hub: an efficient starting point for analyzing genome-wide studies in GenePattern, Integrative Genomics Viewer, and R/Bioconductor [J].
Coletta, Alain ;
Molter, Colin ;
Duque, Robin ;
Steenhoff, David ;
Taminau, Jonatan ;
de Schaetzen, Virginie ;
Meganck, Stijn ;
Lazar, Cosmin ;
Venet, David ;
Detours, Vincent ;
Nowe, Ann ;
Bersini, Hugues ;
Solis, David Y. Weiss .
GENOME BIOLOGY, 2012, 13 (11)
[9]  
Dobbin KK, 2005, CLIN CANCER RES, V11, P565
[10]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210