A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics

被引:10
|
作者
Halloran, John T. [1 ]
Rocke, David M. [2 ]
机构
[1] Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA
[2] Univ Calif Davis, Div Biostat, Davis, CA 95616 USA
基金
美国国家卫生研究院;
关键词
tandem mass spectrometry; machine learning; support vector machine; percolator; TRON; SENSITIVE PEPTIDE IDENTIFICATION; FALSE DISCOVERY RATES; MS-GF PLUS; SHOTGUN PROTEOMICS; NEWTON METHOD; ACCURATE; DATABASE;
D O I
10.1021/acs.jproteome.7b00767
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l(2)-SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l(2)-SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l(2)-SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade.
引用
收藏
页码:1978 / 1982
页数:5
相关论文
共 50 条
  • [21] Machine Learning Based Transient Stability Emulation and Dynamic System Equivalencing of Large-Scale AC-DC Grids for Faster-Than-Real-Time Digital Twin
    Cao, Shiqi
    Dinavahi, Venkata
    Lin, Ning
    IEEE ACCESS, 2022, 10 (112975-112988) : 112975 - 112988
  • [22] Ursgal, Universal Python']Python Module Combining Common Bottom-Up Proteomics Tools for Large-Scale Analysis
    Kremer, Lukas P. M.
    Leufken, Johannes
    Oyunchimeg, Purevdulam
    Schulze, Stefan
    Fufezan, Christian
    JOURNAL OF PROTEOME RESEARCH, 2016, 15 (03) : 788 - 794
  • [23] Efficient detection of data entry errors in large-scale public health surveys: an unsupervised machine learning approach
    Sau, Arkaprabha
    Phadikar, Santanu
    Bhakta, Ishita
    DISCOVER PUBLIC HEALTH, 2024, 21 (01)
  • [24] An Efficient Support Vector Machine Learning Method with Second-Order Cone Programming for Large-Scale Problems
    Rameswar Debnath
    Masakazu Muramatsu
    Haruhisa Takahashi
    Applied Intelligence, 2005, 23 : 219 - 239
  • [25] An efficient support vector machine cone programming for learning method with second-order large-scale problems
    Debnath, R
    Muramatsu, M
    Takahashi, H
    APPLIED INTELLIGENCE, 2005, 23 (03) : 219 - 239
  • [26] Real-Time Detection of Malware Downloads via Large-Scale URL→File→Machine Graph Mining
    Rahbarinia, Babak
    Balduzzi, Marco
    Perdisci, Roberto
    ASIA CCS'16: PROCEEDINGS OF THE 11TH ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, : 783 - 794
  • [27] Wages: The Worst Transistor Aging Analysis for Large-scale Analog Integrated Circuits via Domain Generalization
    Chen, Tinghuan
    Geng, Hao
    Sun, Qi
    Wan, Sanping
    Sun, Yongsheng
    Yu, Huatao
    Yu, Bei
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (05)
  • [28] An athlete-referee dual learning system for real-time optimization with large-scale complex constraints
    Zhang, Yuchen
    Liu, Jizhe
    Xu, Yan
    Dong, Zhao Yang
    KNOWLEDGE-BASED SYSTEMS, 2023, 271
  • [29] Modelling and interpreting evacuation time and exit choice for large-scale ancient architectural complex using machine learning
    Wang, Yi
    Chen, Jialiang
    Hu, Yi
    Weng, Xinran
    JOURNAL OF BUILDING ENGINEERING, 2023, 80
  • [30] Efficient Distributed Preprocessing Model for Machine Learning-Based Anomaly Detection over Large-Scale Cybersecurity Datasets
    Larriva-Novo, Xavier
    Vega-Barbas, Mario
    Villagra, Victor A.
    Rivera, Diego
    Alvarez-Campana, Manuel
    Berrocal, Julio
    APPLIED SCIENCES-BASEL, 2020, 10 (10):