Bipartite matching generalizations for peptide identification in tandem mass spectrometry

被引:6
作者
Bai, Wenruo [1 ]
Bilmes, Jeffrey [1 ]
Noble, William S. [2 ,3 ]
机构
[1] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[3] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
来源
PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS | 2016年
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
proteomics; submodularity; mass spectrometry; SUBMODULAR SET-FUNCTIONS; SHOTGUN PROTEOMICS; SPECTRAL DATA; ALGORITHM; DATABASE;
D O I
10.1145/2975167.2975201
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Motivation: Identification of spectra produced by a shotgun proteomics mass spectrometry experiment is commonly performed by searching the observed spectra against a peptide database. The heart of this search procedure is a score function that evaluates the quality of a hypothesized match between an observed spectrum and a theoretical spectrum corresponding to a particular peptide sequence. Accordingly, the success of a spectrum analysis pipeline depends critically upon this peptide-spectrum score function. Results: We developed peptide-spectrum score functions that compute the maximum value of a submodular function under m matroid constraints. We call this procedure a submodular generalized matching (SGM) since it generalizes bipartite matching. We use a greedy algorithm to compute maximization, which can achieve a solution whose objective is guaranteed to be at least 1/1+m of the true optimum. The advantage of the SGM framework is that known long-range properties of experimental spectra can be modeled by designing suitable submodular functions and matroid constraints. Experiments on four data sets from various organisms and mass spectrometry platforms show that the SGM approach leads to significantly improved performance compared to several state-of-the-art methods. Supplementary information, C++ source code, and data sets can be found at https://melodi- lab.github.io/SGM.
引用
收藏
页码:327 / 336
页数:10
相关论文
共 32 条
[1]  
[Anonymous], 1978, OPTIMIZATION TECHNIQ
[2]  
[Anonymous], 2001, COMP VIS 2001 ICCV 2
[3]  
[Anonymous], 2012, UNCERTAINTY ARTIFICI
[4]  
[Anonymous], 2010, NIPS
[5]   SUBMODULAR SET-FUNCTIONS, MATROIDS AND THE GREEDY ALGORITHM - TIGHT WORST-CASE BOUNDS AND SOME GENERALIZATIONS OF THE RADO-EDMONDS THEOREM [J].
CONFORTI, M ;
CORNUEJOLS, G .
DISCRETE APPLIED MATHEMATICS, 1984, 7 (03) :251-274
[6]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[7]   Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra [J].
Diament, Benjamin J. ;
Noble, William Stafford .
JOURNAL OF PROTEOME RESEARCH, 2011, 10 (09) :3871-3879
[8]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[9]   A fast SEQUEST cross correlation algorithm [J].
Eng, Jimmy K. ;
Fischer, Bernd ;
Grossmann, Jonas ;
MacCoss, Michael J. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (10) :4598-4602
[10]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989