Learning and Imputation for Mass-spec Bias Reduction (LIMBR)

被引:11
作者
Crowell, Alexander M. [1 ]
Greene, Casey S. [2 ]
Loros, Jennifer J. [3 ]
Dunlap, Jay C. [1 ]
机构
[1] Geisel Sch Med Dartmouth, Dept Mol & Syst Biol, Hanover, NH 03755 USA
[2] Univ Penn, Perelman Sch Med, Dept Syst Pharmacol & Translat Therapeut, Philadelphia, PA 19104 USA
[3] Geisel Sch Med Dartmouth, Dept Biochem & Cell Biol, Hanover, NH 03755 USA
基金
美国国家卫生研究院;
关键词
PROTEOMICS; NORMALIZATION; HETEROGENEITY;
D O I
10.1093/bioinformatics/bty828
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Decreasing costs are making it feasible to perform time series proteomics and genomics experiments with more replicates and higher resolution than ever before. With more replicates and time points, proteome and genome-wide patterns of expression are more readily discernible. These larger experiments require more batches exacerbating batch effects and increasing the number of bias trends. In the case of proteomics, where methods frequently result in missing data this increasing scale is also decreasing the number of peptides observed in all samples. The sources of batch effects and missing data are incompletely understood necessitating novel techniques. Results Here we show that by exploiting the structure of time series experiments, it is possible to accurately and reproducibly model and remove batch effects. We implement Learning and Imputation for Mass-spec Bias Reduction (LIMBR) software, which builds on previous block-based models of batch effects and includes features specific to time series and circadian studies. To aid in the analysis of time series proteomics experiments, which are often plagued with missing data points, we also integrate an imputation system. By building LIMBR for imputation and time series tailored bias modeling into one straightforward software package, we expect that the quality and ease of large-scale proteomics and genomics time series experiments will be significantly increased. Availability and implementation Python code and documentation is available for download at https://github.com/aleccrowell/LIMBR and LIMBR can be downloaded and installed with dependencies using pip install limbr'. Supplementary information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:1518 / 1526
页数:9
相关论文
共 34 条
[1]  
Batista G. E. A. P. A., 2001, P ARG S ART INT BUEN, V30, P1
[2]   A ketogenic diet rescues hippocampal memory defects in a mouse model of Kabuki syndrome [J].
Benjamin, Joel S. ;
Pilarowski, Genay O. ;
Carosso, Giovanni A. ;
Zhang, Li ;
Huso, David. L. ;
Goff, Loyal A. ;
Vernon, Hilary J. ;
Hansen, Kasper D. ;
Bjornsson, Hans T. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (01) :125-130
[3]   svapls: an R package to correct for hidden factors of variability in gene expression studies [J].
Chakraborty, Sutirtha ;
Datta, Somnath ;
Datta, Susmita .
BMC BIOINFORMATICS, 2013, 14
[4]   Defining the consequences of genetic variation on a proteome-wide scale [J].
Chick, Joel M. ;
Munger, Steven C. ;
Simecek, Petr ;
Huttlin, Edward L. ;
Choi, Kwangbom ;
Gatti, Daniel M. ;
Raghupathy, Narayanan ;
Svenson, Karen L. ;
Churchill, Gary A. ;
Gygi, Steven P. .
NATURE, 2016, 534 (7608) :500-+
[5]  
Gelman A., 2007, Data Analysis Using Regression and Multilevel/Hierarchical Models, P529
[6]   JTK_CYCLE: An Efficient Nonparametric Algorithm for Detecting Rhythmic Components in Genome-Scale Data Sets [J].
Hughes, Michael E. ;
Hogenesch, John B. ;
Kornacker, Karl .
JOURNAL OF BIOLOGICAL RHYTHMS, 2010, 25 (05) :372-380
[7]   Defining, Comparing, and Improving iTRAQ Quantification in Mass Spectrometry Proteomics Data [J].
Hultin-Rosenberg, Lina ;
Forshed, Jenny ;
Branca, Rui M. M. ;
Lehtio, Janne ;
Johansson, Henrik J. .
MOLECULAR & CELLULAR PROTEOMICS, 2013, 12 (07) :2021-2031
[8]   Improved Statistical Methods Enable Greater Sensitivity in Rhythm Detection for Genome-Wide Data [J].
Hutchison, Alan L. ;
Maienschein-Cline, Mark ;
Chiang, Andrew H. ;
Tabei, S. M. Ali ;
Gudjonson, Herman ;
Bahroos, Neil ;
Allada, Ravi ;
Dinner, Aaron R. .
PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (03)
[9]   Practical impacts of genomic data "cleaning" on biological discovery using surrogate variable analysis [J].
Jaffe, Andrew E. ;
Hyde, Thomas ;
Kleinman, Joel ;
Weinbergern, Daniel R. ;
Chenoweth, Joshua G. ;
Mckay, Ronald D. ;
Leek, Jeffrey T. ;
Colantuoni, Carlo .
BMC BIOINFORMATICS, 2015, 16
[10]   Normalization and missing value imputation for label-free LC-MS analysis [J].
Karpievitch, Yuliya V. ;
Dabney, Alan R. ;
Smith, Richard D. .
BMC BIOINFORMATICS, 2012, 13 :S5