A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics

被引:10
|
作者
Halloran, John T. [1 ]
Rocke, David M. [2 ]
机构
[1] Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA
[2] Univ Calif Davis, Div Biostat, Davis, CA 95616 USA
基金
美国国家卫生研究院;
关键词
tandem mass spectrometry; machine learning; support vector machine; percolator; TRON; SENSITIVE PEPTIDE IDENTIFICATION; FALSE DISCOVERY RATES; MS-GF PLUS; SHOTGUN PROTEOMICS; NEWTON METHOD; ACCURATE; DATABASE;
D O I
10.1021/acs.jproteome.7b00767
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l(2)-SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l(2)-SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l(2)-SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade.
引用
收藏
页码:1978 / 1982
页数:5
相关论文
共 50 条
  • [1] Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0
    The, Matthew
    MacCoss, Michael J.
    Noble, William S.
    Kall, Lukas
    JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 2016, 27 (11) : 1719 - 1727
  • [2] Influence of various endogenous and artefact modifications on large-scale proteomics analysis
    Bienvenut, Willy V.
    Sumpton, David
    Lilla, Sergio
    Martinez, Aude
    Meinnel, Thierry
    Giglione, Carmela
    RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2013, 27 (03) : 443 - 450
  • [3] The Revolution and Evolution of Shotgun Proteomics for Large-Scale Proteome Analysis
    Yates, John R., III
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2013, 135 (05) : 1629 - 1640
  • [4] Proline: an efficient and user-friendly software suite for large-scale proteomics
    Bouyssie, David
    Hesse, Anne-Marie
    Mouton-Barbosa, Emmanuelle
    Rompais, Magali
    Macron, Charlotte
    Carapito, Christine
    de Peredo, Anne Gonzalez
    Coute, Yohann
    Dupierris, Veronique
    Burel, Alexandre
    Menetrey, Jean-Philippe
    Kalaitzakis, Andrea
    Poisat, Julie
    Romdhani, Aymen
    Burlet-Schiltz, Odile
    Cianferani, Sarah
    Garin, Jerome
    Bruley, Christophe
    BIOINFORMATICS, 2020, 36 (10) : 3148 - 3155
  • [5] SVM ensemble based transfer learning for large-scale membrane proteins discrimination
    Mei, Suyu
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 340 : 105 - 110
  • [6] Predicting file downloading time in cellular network: Large-Scale analysis of machine learning approaches
    Samba, Alassane
    Busnel, Yann
    Blanc, Alberto
    Dooze, Philippe
    Simon, Gwendal
    COMPUTER NETWORKS, 2018, 145 : 243 - 254
  • [7] KYSS: Mass spectrometry data quality assessment for protein analysis and large-scale proteomics
    Such-Sanmartin, Gerard
    Sidoli, Simone
    Ventura-Espejo, Estela
    Jensen, Ole N.
    BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2014, 445 (04) : 702 - 707
  • [8] Efficient Vertical Federated Learning Method for Ridge Regression of Large-Scale Samples
    Cai, Jianping
    Liu, Ximeng
    Yu, Zhiyong
    Guo, Kun
    Li, Jiayin
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (02) : 511 - 526
  • [9] Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins
    Claeys, Tine
    Menu, Maxime
    Bouwmeester, Robbin
    Gevaert, Kris
    Martens, Lennart
    JOURNAL OF PROTEOME RESEARCH, 2023, 22 (04) : 1181 - 1192
  • [10] YISHAN: Managing Large-scale Cloud Database Instances via Machine Learning
    Xiao, Wenhua
    Yang, Cheng
    Wang, Ji
    Zhu, Xiaomin
    Bao, Weidong
    Feng, Xiaojie
    Xie, Yu
    Cao, Wei
    Yu, Feng
    Liu, Ling
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (01) : 724 - 738