A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics

被引:10
|
作者
Halloran, John T. [1 ]
Rocke, David M. [2 ]
机构
[1] Univ Calif Davis, Dept Publ Hlth Sci, Davis, CA 95616 USA
[2] Univ Calif Davis, Div Biostat, Davis, CA 95616 USA
基金
美国国家卫生研究院;
关键词
tandem mass spectrometry; machine learning; support vector machine; percolator; TRON; SENSITIVE PEPTIDE IDENTIFICATION; FALSE DISCOVERY RATES; MS-GF PLUS; SHOTGUN PROTEOMICS; NEWTON METHOD; ACCURATE; DATABASE;
D O I
10.1021/acs.jproteome.7b00767
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l(2)-SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l(2)-SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l(2)-SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade.
引用
收藏
页码:1978 / 1982
页数:5
相关论文
共 50 条
  • [31] Re-Fraction: A Machine Learning Approach for Deterministic Identification of Protein Homologues and Splice Variants in Large-scale MS-based Proteomics
    Yang, Pengyi
    Humphrey, Sean J.
    Fazakerley, Daniel J.
    Prior, Matthew J.
    Yang, Guang
    James, David E.
    Yang, Jean Yee-Hwa
    JOURNAL OF PROTEOME RESEARCH, 2012, 11 (05) : 3035 - 3045
  • [32] Machine learning framework for gut microbiome biomarkers discovery and modulation analysis in large-scale obese population
    Liu, Yaoliang
    Zhu, Jinlin
    Wang, Hongchao
    Lu, Wenwei
    Lee, Yuan Kun
    Zhao, Jianxin
    Zhang, Hao
    BMC GENOMICS, 2022, 23 (01)
  • [33] Large-scale data-driven financial risk management & analysis using machine learning strategies
    Murugan M.S.
    T S.K.
    Measurement: Sensors, 2023, 27
  • [34] Machine learning framework for gut microbiome biomarkers discovery and modulation analysis in large-scale obese population
    Yaoliang Liu
    Jinlin Zhu
    Hongchao Wang
    Wenwei Lu
    Yuan Kun LEE
    Jianxin Zhao
    Hao Zhang
    BMC Genomics, 23
  • [35] Large-Scale Analysis of the Head Proximity Effects on Antenna Performance Using Machine Learning Based Models
    Diao, Yinliang
    Rashed, Essam A.
    Hirata, Akimasa
    IEEE ACCESS, 2020, 8 : 154060 - 154071
  • [36] Surfactant Cocktail-Aided Extraction/Precipitation/On-Pellet Digestion Strategy Enables Efficient and Reproducible Sample Preparation for Large-Scale Quantitative Proteomics
    Shen, Shichen
    An, Bo
    Wang, Xue
    Hilchey, Shannon P.
    Li, Jun
    Cao, Jin
    Tian, Yu
    Hu, Chenqi
    Jin, Liang
    Ng, Andrew
    Tu, Chengjian
    Qu, Miao
    Zand, Martin S.
    Qu, Jun
    ANALYTICAL CHEMISTRY, 2018, 90 (17) : 10350 - 10359
  • [37] Predicting solutions of large-scale optimization problems via machine learning: A case study in blood supply chain management
    Abbasi, Babak
    Babaei, Toktam
    Hosseinifard, Zahra
    Smith-Miles, Kate
    Dehghani, Maryam
    COMPUTERS & OPERATIONS RESEARCH, 2020, 119
  • [38] An Enhanced Real-Time Intrusion Detection Framework Using Federated Transfer Learning in Large-Scale IoT Networks
    Harahsheh, Khawlah
    Alzaqebah, Malek
    Chen, Chung-Hao
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (12) : 35 - 42
  • [39] Real-time transient stability prediction based on relevance vector learning mechanism for large-scale power system
    Niu Lin
    Du Zhi-gang
    Zhao Jian-guo
    ICIEA 2007: 2ND IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-4, PROCEEDINGS, 2007, : 147 - 152
  • [40] Performance Analysis of Energy Production of Large-Scale Solar Plants Based on Artificial Intelligence (Machine Learning) Technique
    Abubakar, Muhammad
    Che, Yanbo
    Ivascu, Larisa
    Almasoudi, Fahad M.
    Jamil, Irfan
    PROCESSES, 2022, 10 (09)