Deep learning for peptide identification from metaproteomics datasets

被引:9
作者
Feng, Shichao [1 ]
Sterzenbach, Ryan [2 ]
Guo, Xuan [1 ]
机构
[1] Univ North Texas, Dept Comp Sci & Engn, 3940 N Elm St,Ste F290, Denton, TX 76207 USA
[2] Univ North Texas, Dept Biomed Engn, Denton, TX 76203 USA
基金
美国国家卫生研究院;
关键词
Peptide identification; Deep learning; Tandem mass spectrometry; CNN; PROTEIN IDENTIFICATION; STATISTICAL-MODEL; MS/MS; CONFIDENCE; CHALLENGES; REVEALS;
D O I
10.1016/j.jprot.2021.104316
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Metaproteomics is becoming widely used in microbiome research for gaining insights into the functional state of the microbial community. Current metaproteomics studies are generally based on high-throughput tandem mass spectrometry (MS/MS) coupled with liquid chromatography. In this paper, we proposed a deep-learningbased algorithm, named DeepFilter, for improving peptide identifications from a collection of tandem mass spectra. The key advantage of the DeepFilter is that it does not need ad hoc training or fine-tuning as in existing filtering tools. DeepFilter is freely available under the GNU GPL license at https://github. com/Biocomputing-Research-Group/DeepFilter. Significance: The identification of peptides and proteins from MS data involves the computational procedure of searching MS/MS spectra against a predefined protein sequence database and assigning top-scored peptides to spectra. Existing computational tools are still far from being able to extract all the information out of MS/MS data sets acquired from metaproteome samples. Systematical experiment results demonstrate that the DeepFilter identified up to 12% and 9% more peptide-spectrum-matches and proteins, respectively, compared with existing filtering algorithms, including Percolator, Q-ranker, PeptideProphet, and iProphet, on marine and soil microbial metaproteome samples with false discovery rate at 1%. The taxonomic analysis shows that DeepFilter found up to 7%, 10%, and 14% more species from marine, soil, and human gut samples compared with existing filtering algorithms. Therefore, DeepFilter was believed to generalize properly to new, previously unseen peptidespectrum-matches and can be readily applied in peptide identification from metaproteomics data.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] AttnPep: A Self-Attention-Based Deep Learning Method for Peptide Identification in Shotgun Proteomics
    Li, Yulin
    He, Qingzu
    Guo, Huan
    Shuai, Stella C.
    Cheng, Jinyan
    Liu, Liyu
    Shuai, Jianwei
    JOURNAL OF PROTEOME RESEARCH, 2024, 23 (02) : 834 - 843
  • [2] MSBooster: improving peptide identification rates using deep learning-based features
    Yang, Kevin L.
    Yu, Fengchao
    Teo, Guo Ci
    Li, Kai
    Demichev, Vadim
    Ralser, Markus
    Nesvizhskii, Alexey I.
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [3] Deep Learning and Machine Learning Techniques Applied to Speaker Identification on Small Datasets
    Manfron, Enrico
    Teixeira, Joao Paulo
    Minetto, Rodrigo
    OPTIMIZATION, LEARNING ALGORITHMS AND APPLICATIONS, PT II, OL2A 2023, 2024, 1982 : 195 - 210
  • [4] Deep learning models for stock prediction on diverse datasets
    Sable, Rachna
    Goel, Shivani
    Chatterjee, Pradeep
    Jindal, Mani
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (03): : 25 - 38
  • [5] A Systematic Collection of Medical Image Datasets for Deep Learning
    Li, Johann
    Zhu, Guangming
    Hua, Cong
    Feng, Mingtao
    Bennamoun, Basheer
    Li, Ping
    Lu, Xiaoyuan
    Song, Juan
    Shen, Peiyi
    Xu, Xu
    Mei, Lin
    Zhang, Liang
    Shah, Syed Afaq Ali
    Bennamoun, Mohammed
    ACM COMPUTING SURVEYS, 2024, 56 (05)
  • [6] DeepRescore: Leveraging Deep Learning to Improve Peptide Identification in Immunopeptidomics
    Li, Kai
    Jain, Antrix
    Malovannaya, Anna
    Wen, Bo
    Zhang, Bing
    PROTEOMICS, 2020, 20 (21-22)
  • [7] Disease Inference on Medical Datasets Using Machine Learning and Deep Learning Algorithms
    Chinnaswamy, Arunkumar
    Srinivasan, Ramakrishnan
    Gaurang, Desai Prutha
    COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 902 - 908
  • [8] A cost-sensitive online learning method for peptide identification
    Liang Xijun
    Xia Zhonghang
    Jian Ling
    Wang Yongxiang
    Niu Xinnan
    Link, Andrew J.
    BMC GENOMICS, 2020, 21 (01)
  • [9] Improvements to the Percolator Algorithm for Peptide Identification from Shotgun Proteomics Data Sets
    Spivak, Marina
    Weston, Jason
    Bottou, Leon
    Kall, Lukas
    Noble, William Stafford
    JOURNAL OF PROTEOME RESEARCH, 2009, 8 (07) : 3737 - 3745
  • [10] Investigation of Deep Learning Datasets for Warehousing Logistics
    Holm, Dimitrij-Marian
    Junge, Philipp
    Rutinowski, Jerome
    Fottner, Johannes
    PROCEEDINGS OF THE CONFERENCE ON PRODUCTION SYSTEMS AND LOGISTICS, CPSL 2023-2, 2023, : 119 - 128