Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithm and organism-specific data

被引:30
作者
Qeli, Ermir [1 ]
Omasits, Ulrich [1 ,2 ]
Goetze, Sandra [1 ,2 ]
Stekhoven, Daniel J. [1 ]
Frey, Juerg E. [3 ]
Basler, Konrad [1 ]
Wollscheid, Bernd [2 ]
Brunner, Erich [1 ]
Ahrens, Christian H. [1 ,3 ]
机构
[1] Univ Zurich, Inst Mol Life Sci, CH-8057 Zurich, Switzerland
[2] ETH, Inst Mol Syst Biol, CH-8093 Zurich, Switzerland
[3] Inst Plant Prod Sci, Res Grp Mol Diagnost Genom & Bioinformat, Agroscope, CH-8820 Wadenswil, Switzerland
基金
瑞士国家科学基金会;
关键词
Targeted proteomics; Peptide detectability; Machine learning; Rank prediction algorithms; Proteotypic peptides; SRM; MASS-SPECTROMETRY; QUANTITATIVE PROTEOMICS; PROTEIN INFERENCE; GLOBAL ANALYSIS; QUANTIFICATION; REPRODUCIBILITY; IDENTIFICATION; ASSAYS; MODEL; EXPRESSION;
D O I
10.1016/j.jprot.2014.05.011
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The in silico prediction of the best-observable "proteotypic" peptides in mass spectrometry-based workflows is a challenging problem. Being able to accurately predict such peptides would enable the informed selection of proteotypic peptides for targeted quantification of previously observed and non-observed proteins for any organism, with a significant impact for clinical proteomics and systems biology studies. Current prediction algorithms rely on physicochemical parameters in combination with positive and negative Mining sets to identify those peptide properties that most profoundly affect their general detectabllity. Here we present PeptideRank, an approach that uses learning to rank algorithm for peptide detectability prediction from shotgun proteomics data, and that eliminates the need to select a negative dataset for the training step. A large number of different peptide properties are used to train ranking models in order to predict a ranking of the best-observable peptides within a protein. Empirical evaluation with rank accuracy metrics showed that PeptideRank complements existing prediction algorithms. Our results indicate that the best performance is achieved when it is trained on organism-specific shotgun proteomics data, and that PeptideRank is most accurate for short to medium-sized and abundant proteins, without any loss in prediction accuracy for the important class of membrane proteins. Biological significance Targeted proteomics approaches have been gaining a lot of momentum and hold immense potential for systems biology studies and clinical proteomics. However, since only very few complete proteomes have been reported to date, for a considerable fraction of a proteome there is no experimental proteomics evidence that would allow to guide the selection of the best-suited proteotypic peptides (PTPs), i.e. peptides that are specific to a given proteoform and that are repeatedly observed in a mass spectrometer. We describe a novel, rank-based approach for the prediction of the best-suited PTPs for targeted proteomics applications. By building on methods developed in the field of information retrieval (e.g. web search engines like Google's PageRank), we circumvent the delicate step of selecting positive and negative training sets and at the same time also more closely reflect the experimentalist's need for selecting e.g. the 5 most promising peptides for targeting a protein of interest. This approach allows to predict PTPs for not yet observed proteins or for organisms without prior experimental proteomics data such as many non-model organisms. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:269 / 283
页数:15
相关论文
共 70 条
[1]   Automated Detection of Inaccurate and Imprecise Transitions in Peptide Quantification by Multiple Reaction Monitoring Mass Spectrometry [J].
Abbatiello, Susan E. ;
Mani, D. R. ;
Keshishian, Hasmik ;
Carr, Steven A. .
CLINICAL CHEMISTRY, 2010, 56 (02) :291-305
[2]   Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma [J].
Addona, Terri A. ;
Abbatiello, Susan E. ;
Schilling, Birgit ;
Skates, Steven J. ;
Mani, D. R. ;
Bunk, David M. ;
Spiegelman, Clifford H. ;
Zimmerman, Lisa J. ;
Ham, Amy-Joan L. ;
Keshishian, Hasmik ;
Hall, Steven C. ;
Allen, Simon ;
Blackman, Ronald K. ;
Borchers, Christoph H. ;
Buck, Charles ;
Cardasis, Helene L. ;
Cusack, Michael P. ;
Dodder, Nathan G. ;
Gibson, Bradford W. ;
Held, Jason M. ;
Hiltke, Tara ;
Jackson, Angela ;
Johansen, Eric B. ;
Kinsinger, Christopher R. ;
Li, Jing ;
Mesri, Mehdi ;
Neubert, Thomas A. ;
Niles, Richard K. ;
Pulsipher, Trenton C. ;
Ransohoff, David ;
Rodriguez, Henry ;
Rudnick, Paul A. ;
Smith, Derek ;
Tabb, David L. ;
Tegeler, Tony J. ;
Variyath, Asokan M. ;
Vega-Montoto, Lorenzo J. ;
Wahlander, Asa ;
Waldemarson, Sofia ;
Wang, Mu ;
Whiteaker, Jeffrey R. ;
Zhao, Lei ;
Anderson, N. Leigh ;
Fisher, Susan J. ;
Liebler, Daniel C. ;
Paulovich, Amanda G. ;
Regnier, Fred E. ;
Tempst, Paul ;
Carr, Steven A. .
NATURE BIOTECHNOLOGY, 2009, 27 (07) :633-U85
[3]   Generating and navigating proteome maps using mass spectrometry [J].
Ahrens, Christian H. ;
Brunner, Erich ;
Qeli, Ermir ;
Basler, Konrad ;
Aebersold, Ruedi .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2010, 11 (11) :789-801
[4]  
Alves P, 2007, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007, P409
[5]   Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins [J].
Anderson, L ;
Hunter, CL .
MOLECULAR & CELLULAR PROTEOMICS, 2006, 5 (04) :573-588
[6]   A Human Proteome Detection and Quantitation Project [J].
Anderson, N. Leigh ;
Anderson, Norman G. ;
Pearson, Terry W. ;
Borchers, Christoph H. ;
Paulovich, Amanda G. ;
Patterson, Scott D. ;
Gillette, Michael ;
Aebersold, Ruedi ;
Carr, Steven A. .
MOLECULAR & CELLULAR PROTEOMICS, 2009, 8 (05) :883-886
[7]  
[Anonymous], 2002, P ACM SIGKDD KDD 200, DOI 10.1145/775047.775067
[8]   Evaluation of confidence and reproducibility in quantitative proteomics performed by a capillary isoelectric focusing-based proteomic platform coupled with a spectral counting approach [J].
Balgley, Brian M. ;
Wang, Weijie ;
Song, Tao ;
Fang, Xueping ;
Yang, Li ;
Lee, Cheng S. .
ELECTROPHORESIS, 2008, 29 (14) :3047-3054
[9]   Absolute quantification of the G protein-coupled receptor rhodopsin by LC/MS/MS using proteolysis product peptides and synthetic peptide standards [J].
Barnidge, DR ;
Dratz, EA ;
Martin, T ;
Bonilla, LE ;
Moran, LB ;
Lindall, A .
ANALYTICAL CHEMISTRY, 2003, 75 (03) :445-451
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32