PepExplorer: A Similarity-driven Tool for Analyzing de Novo Sequencing Results

被引:31
作者
Leprevost, Felipe V. [1 ]
Valente, Richard H. [2 ,3 ]
Lima, Diogo B. [1 ]
Perales, Jonas [2 ,3 ]
Melani, Rafael [4 ]
Yates, John R., III [5 ]
Barbosa, Valmir C. [6 ]
Junqueira, Magno [4 ]
Carvalho, Paulo C. [1 ]
机构
[1] Fiocruz MS, Carlos Chagas Inst, Lab Prote & Prot Engn, Fiocruz, Parana, Brazil
[2] Fiocruz MS, Inst Oswaldo Cruz, Lab Toxinol, BR-21045900 Rio De Janeiro, Brazil
[3] CNPq, Inst Nacl Ciencia & Tecnol Toxinas INCTTox, Brasilia, DF, Brazil
[4] Univ Fed Rio de Janeiro, Dept Biochem, Prote Unit, Rio de Janeiro Prote Network, Rio De Janeiro, Brazil
[5] Scripps Res Inst, Dept Physiol Chem, La Jolla, CA 92037 USA
[6] Univ Fed Rio de Janeiro, Syst Engn & Comp Sci Program, Rio De Janeiro, Brazil
关键词
PROTEIN IDENTIFICATION; MASS-SPECTROMETRY; PEPTIDE IDENTIFICATION; TANDEM; DISCOVERY; PROTEOMICS; INHIBITOR; ALGORITHM; DATABASES; SPECTRA;
D O I
10.1074/mcp.M113.037002
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Peptide spectrum matching is the current gold standard for protein identification via mass-spectrometry-based proteomics. Peptide spectrum matching compares experimental mass spectra against theoretical spectra generated from a protein sequence database to perform identification, but protein sequences not present in a database cannot be identified unless their sequences are in part conserved. The alternative approach, de novo sequencing, can make it possible to infer a peptide sequence directly from a mass spectrum, but interpreting long lists of peptide sequences resulting from large-scale experiments is not trivial. With this as motivation, PepExplorer was developed to use rigorous pattern recognition to assemble a list of homologue proteins using de novo sequencing data coupled to sequence alignment to allow biological interpretation of the data. PepExplorer can read the output of various widely adopted de novo sequencing tools and converge to a list of proteins with a global false-discovery rate. To this end, it employs a radial basis function neural network that considers precursor charge states, de novo sequencing scores, peptide lengths, and alignment scores to select similar protein candidates, from a target-decoy database, usually obtained from phylogenetically related species. Alignments are performed using a modified Smith-Waterman algorithm tailored for the task at hand. We verified the effectiveness of our approach using a reference set of identifications generated by ProLuCID when searching for Pyrococcus furiosus mass spectra on the corresponding NCBI RefSeq database. We then modified the sequence database by swapping amino acids until ProLuCID was no longer capable of identifying any proteins. By searching the mass spectra using PepExplorer on the modified database, we were able to recover most of the identifications at a 1% false-discovery rate. Finally, we employed PepExplorer to disclose a comprehensive proteomic assessment of the Bothrops jararaca plasma, a known biological source of natural inhibitors of snake toxins. PepExplorer is integrated into the PatternLab for Proteomics environment, which makes available various tools for downstream data analysis, including resources for quantitative and differential proteomics.
引用
收藏
页码:2480 / 2489
页数:10
相关论文
共 9 条
  • [1] UniNovo: a universal tool for de novo peptide sequencing
    Jeong, Kyowon
    Kim, Sangtae
    Pevzner, Pavel A.
    BIOINFORMATICS, 2013, 29 (16) : 1953 - 1962
  • [2] AUDENS:: A tool for automated peptide de novo sequencing
    Grossmann, J
    Roos, FF
    Cieliebak, M
    Lipták, Z
    Mathis, LK
    Müller, M
    Gruissem, W
    Baginsky, S
    JOURNAL OF PROTEOME RESEARCH, 2005, 4 (05) : 1768 - 1774
  • [3] Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification?
    Muth, Thilo
    Renard, Bernhard Y.
    BRIEFINGS IN BIOINFORMATICS, 2018, 19 (05) : 954 - 970
  • [4] PRIME:: A mass spectrum data mining tool for de novo sequencing and PTMs identification
    Yan, B
    Qu, YX
    Mao, FL
    Olman, VN
    Xu, Y
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2005, 20 (04) : 483 - 490
  • [5] DNMSO; an ontology for representing de novo sequencing results from Tandem-MS data
    Takan, Savas
    Allmer, Jens
    PEERJ, 2020, 8
  • [6] SWPepNovo: An Efficient De Novo Peptide Sequencing Tool for Large-scale MS/MS Spectra Analysis
    Li, Chuang
    Li, Kenli
    Li, Keqin
    Xie, Xianghui
    Lin, Feng
    INTERNATIONAL JOURNAL OF BIOLOGICAL SCIENCES, 2019, 15 (09): : 1787 - 1801
  • [7] Mining Novel Allergens from Coconut Pollen Employing Manual De Novo Sequencing and Homology-Driven Proteomics
    Saha, Bodhisattwa
    Sircar, Gaurab
    Pandey, Naren
    Bhattacharya, Swati Gupta
    JOURNAL OF PROTEOME RESEARCH, 2015, 14 (11) : 4823 - 4833
  • [8] Homology-Driven Proteomics of Dinoflagellates with Unsequenced Genomes Using MALDI-TOF/TOF and Automated De Novo Sequencing
    Wang, Da-Zhi
    Li, Cheng
    Xie, Zhang-Xian
    Dong, Hong-Po
    Lin, Lin
    Hong, Hua-Sheng
    EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE, 2011, 2011
  • [9] Survey of Pseudomonas aeruginosa and its phages: de novo peptide sequencing as a novel tool to assess the diversity of worldwide collected viruses
    Ceyssens, Pieter-Jan
    Noben, Jean-Paul
    Ackermann, Hans-W.
    Verhaegen, Jan
    De Vos, Daniel
    Pirnay, Jean-Paul
    Merabishvili, Maia
    Vaneechoutte, Mario
    Chibeu, Andrew
    Volckaert, Guido
    Lavigne, Rob
    ENVIRONMENTAL MICROBIOLOGY, 2009, 11 (05) : 1303 - 1313