Flexible and Accessible Workflows for Improved Proteogenomic Analysis Using the Galaxy Framework

被引:68
作者
Jagtap, Pratik D. [1 ,2 ]
Johnson, James E. [3 ]
Onsongo, Getiria [3 ]
Sadler, Fredrik W. [4 ]
Murray, Kevin [2 ]
Wang, Yuanbo [5 ]
Shenykrnan, Gloria M. [6 ]
Bandhakavi, Sricharan [7 ]
Smith, Lloyd M. [6 ]
Griffin, Timothy J. [2 ]
机构
[1] Univ Minnesota, Ctr Mass Spectrometry & Prote, St Paul, MN 55108 USA
[2] Univ Minnesota, Dept Biochem Mol Biol & Biophys, Minneapolis, MN 55455 USA
[3] Univ Minnesota, Minnesota Supercomp Inst, Minneapolis, MN 55455 USA
[4] St Olaf Coll, Northfield, MN 55057 USA
[5] Carleton Coll, Dept Comp Sci, Northfield, MN 55057 USA
[6] Univ Wisconsin, Dept Chem, Madison, WI 53705 USA
[7] Biorad Labs, Hercules, CA 94547 USA
基金
美国国家科学基金会;
关键词
proteogenomics; workflows; salivary proteins; customized database generation; peptide corresponding to a novel proteoform; peptide-spectral match evaluation; HUMAN PROTEOME PROJECT; FALSE DISCOVERY RATES; TANDEM MASS-SPECTRA; RNA-SEQ DATA; SEARCH ENGINE; DATA SETS; IDENTIFICATION; PEPTIDES; GENE; PROTEINS;
D O I
10.1021/pr500812t
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Proteogenomics combines large-scale genomic and transcriptomic data with mass-spectrometry-based proteomic data to discover novel protein sequence variants and improve genome annotation. In contrast with conventional proteomic applications, proteogenomic analysis requires a number of additional data processing steps. Ideally, these required steps would be integrated and automated via a single software platform offering accessibility for wet-bench researchers as well as flexibility for user-specific customization and integration of new software tools as they emerge. Toward this end, we have extended the Galaxy bioinformatics framework to facilitate proteogenomic analysis. Using analysis of whole human saliva as an example, we demonstrate Galaxys flexibility through the creation of a modular workflow incorporating both established and customized software tools that improve depth and quality of proteogenomic results. Our customized Galaxy-based software includes automated, batch-mode BLASTP searching and a Peptide Sequence Match Evaluator tool, both useful for evaluating the veracity of putative novel peptide identifications. Our complex workflow (approximately 140 steps) can be easily shared using built-in Galaxy functions, enabling their use and customization by others. Our results provide a blueprint for the establishment of the Galaxy framework as an ideal solution for the emerging field of proteogenomics.
引用
收藏
页码:5898 / 5908
页数:11
相关论文
共 60 条
[1]   Proteogenomics for environmental microbiology [J].
Armengaud, Jean ;
Hartmann, Erica Marie ;
Bland, Celine .
PROTEOMICS, 2013, 13 (18-19) :2731-2742
[2]   GENETICS OF SALIVARY PROTEIN POLYMORPHISMS [J].
AZEN, EA .
CRITICAL REVIEWS IN ORAL BIOLOGY & MEDICINE, 1993, 4 (3-4) :479-485
[3]   A Dynamic Range Compression and Three-Dimensional Peptide Fractionation Analysis Platform Expands Proteome Coverage and the Diagnostic Potential of Whole Saliva [J].
Bandhakavi, Sricharan ;
Stone, Matthew D. ;
Onsongo, Getiria ;
Van Riper, Susan K. ;
Griffin, Timothy J. .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (12) :5590-5600
[4]   AVISPA: a web tool for the prediction and analysis of alternative splicing [J].
Barash, Yoseph ;
Vaquero-Garcia, Jorge ;
Gonzalez-Vallinas, Juan ;
Xiong, Hui Yuan ;
Gao, Weijun ;
Lee, Leo J. ;
Frey, Brendan J. .
GENOME BIOLOGY, 2013, 14 (10)
[5]   Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies [J].
Blakeley, Paul ;
Overton, Ian M. ;
Hubbard, Simon J. .
JOURNAL OF PROTEOME RESEARCH, 2012, 11 (11) :5221-5234
[6]   Wrangling Galaxy's reference data [J].
Blankenberg, Daniel ;
Johnson, James E. ;
Taylor, James ;
Nekrutenko, Anton .
BIOINFORMATICS, 2014, 30 (13) :1917-1919
[7]  
Blankenberg D, 2014, METHODS MOL BIOL, V1150, P21, DOI 10.1007/978-1-4939-0512-6_2
[8]  
Branca RMM, 2014, NAT METHODS, V11, P59, DOI [10.1038/NMETH.2732, 10.1038/nmeth.2732]
[9]   Ability of a salivary intrinsically unstructured protein to bind different tannin targets revealed by mass spectrometry [J].
Canon, Francis ;
Giuliani, Alexandre ;
Pate, Franck ;
Sarni-Manchado, Pascale .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2010, 398 (02) :815-822