Blind normalization of public high-throughput databases

被引:1
作者
Ohse, Sebastian [1 ]
Boerries, Melanie [2 ,3 ]
Busch, Hauke [4 ]
机构
[1] Univ Freiburg, Inst Mol Med & Cell Res, Freiburg, Germany
[2] German Canc Res Ctr, German Canc Consortium DKTK, Heidelberg, Germany
[3] Univ Freiburg, Fac Med, Inst Med Bioinformat & Syst Med, Med Ctr, Freiburg, Germany
[4] Univ Lubeck, Inst Expt Dermatol, Lubeck, Germany
关键词
Blind normalization; High-throughput data; Compressed sensing; Confounding factors; MATRIX COMPLETION; RECOVERY;
D O I
10.7717/peerj-cs.231
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of high-throughput technologies in the domain of molecular and cell biology, as well as medicine, has generated an unprecedented amount of quantitative high-dimensional data. Public databases at present make a wealth of this data available, but appropriate normalization is critical for meaningful analyses integrating different experiments and technologies. Without such normalization, meta-analyses can be difficult to perform and the potential to address shortcomings in experimental designs, such as inadequate replicates or controls with public data, is limited. Because of a lack of quantitative standards and insufficient annotation, large scale normalization across entire databases is currently limited to approaches that demand ad hoc assumptions about noise sources and the biological signal. By leveraging detectable redundancies in public databases, such as related samples and features, we show that blind normalization without constraints on noise sources and the biological signal is possible. The inherent recovery of confounding factors is formulated in the theoretical framework of compressed sensing and employs efficient optimization on manifolds. As public databases increase in size and offer more detectable redundancies, the proposed approach is able to scale to more complex confounding factors. In addition, the approach accounts for missing values and can incorporate spike-in controls. Our work presents a systematic approach to the blind normalization of public high-throughput databases.
引用
收藏
页数:16
相关论文
共 34 条
[1]  
ALLEN GI, 2012, J R STAT SOC B 4, V74, P721, DOI DOI 10.1111/J.1467-9868.2011.01027.X
[2]  
[Anonymous], 2001, The elements of statistical learning: data mining, inference, and prediction
[3]   NCBI GEO: archive for functional genomics data sets-update [J].
Barrett, Tanya ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Holko, Michelle ;
Yefanov, Andrey ;
Lee, Hyeseung ;
Zhang, Naigong ;
Robertson, Cynthia L. ;
Serova, Nadezhda ;
Davis, Sean ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D991-D995
[4]   Adjustment of systematic microarray data biases [J].
Benito, M ;
Parker, J ;
Du, Q ;
Wu, JY ;
Xang, D ;
Perou, CM ;
Marron, JS .
BIOINFORMATICS, 2004, 20 (01) :105-114
[5]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[6]   ROP: MATRIX RECOVERY VIA RANK-ONE PROJECTIONS [J].
Cai, T. Tony ;
Zhang, Anru .
ANNALS OF STATISTICS, 2015, 43 (01) :102-138
[7]  
Candès EJ, 2008, IEEE SIGNAL PROC MAG, V25, P21, DOI 10.1109/MSP.2007.914731
[8]   Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements [J].
Candes, Emmanuel J. ;
Plan, Yaniv .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2011, 57 (04) :2342-2359
[9]   Matrix Completion With Noise [J].
Candes, Emmanuel J. ;
Plan, Yaniv .
PROCEEDINGS OF THE IEEE, 2010, 98 (06) :925-936
[10]   Analysis of microarray data using Z score transformation [J].
Cheadle, C ;
Vawter, MP ;
Freed, WJ ;
Becker, KG .
JOURNAL OF MOLECULAR DIAGNOSTICS, 2003, 5 (02) :73-81