MScDB: A Mass Spectrometry-centric Protein Sequence Database for Proteomics

被引:7
|
作者
Marx, Harald [1 ,2 ]
Lemeer, Simone [1 ]
Klaeger, Susan [1 ]
Rattei, Thomas [3 ]
Kuster, Bernhard [1 ,4 ]
机构
[1] Tech Univ Munich, Chair Prote & Bioanalyt, D-85354 Freising Weihenstephan, Germany
[2] Tech Univ Munich, Chair Genome Oriented Bioinformat, D-85354 Freising Weihenstephan, Germany
[3] Univ Vienna, Dept Computat Syst Biol, A-1090 Vienna, Austria
[4] CIPSM, D-85354 Freising Weihenstephan, Germany
关键词
proteomics; protein identification; mass spectrometry; protein sequence database; protein inference; peptide-centric clustering; STATISTICAL-MODEL; GENE-EXPRESSION; ALGORITHM; INFERENCE; METAPROTEOMICS; IDENTIFICATION; ANNOTATION; ACCURACY; INDEX; SET;
D O I
10.1021/pr400215r
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein sequence databases are indispensable tools for life science research including mass spectrometry (MS)-based proteomics. In current database construction processes, sequence similarity clustering is used to reduce redundancies in the source data. Albeit powerful, it ignores the peptide-centric nature of proteomic data and the fact that MS is able to distinguish similar sequences. Therefore, we introduce an approach that structures the protein sequence space at the peptide level using theoretical and empirical information from large-scale proteomic data to generate a mass spectrometry-centric protein sequence database (MScDB). The core modules of MScDB are an in-silico proteolytic digest and a peptide-centric clustering algorithm that groups protein sequences that are indistinguishable by mass spectrometry. Analysis of various MScDB uses cases against five complex human proteomes, resulting in 69 peptide identifications not present in UniProtKB as well as 79 putative single amino acid polymorphisms. MScDB retains similar to 99% of the identifications in comparison to common databases despite a 3-48% increase in the theoretical peptide search space (but comparable protein sequence space). In addition, MScDB enables cross-species applications such as human/mouse graft models, and our results suggest that the uncertainty in protein assignments to one species can be smaller than 20%.
引用
收藏
页码:2386 / 2398
页数:13
相关论文
共 50 条
  • [1] Native Mass Spectrometry-Centric Approaches Revealed That Neuropeptides Frequently Interact with Amyloid-β
    Wang, Danyang
    Wang, Guibin
    Wang, Xiankun
    Ren, Zhenhua
    Jia, Chenxi
    ACS CHEMICAL NEUROSCIENCE, 2024, 15 (15): : 2719 - 2728
  • [2] Protein Identification by Database Searching of Mass Spectrometry Data in the Teaching of Proteomics
    Marquioni, Vinicius
    Franco Nunes, Francis Morais
    Marques Novo-Mansur, Maria Teresa
    JOURNAL OF CHEMICAL EDUCATION, 2021, 98 (03) : 812 - 823
  • [3] Database Searching in Mass Spectrometry Based Proteomics
    Kertesz-Farkas, Attila
    Reiz, Beata
    Myers, Michael P.
    Pongor, Sandor
    CURRENT BIOINFORMATICS, 2012, 7 (02) : 221 - 230
  • [4] Database normalization is crucial for reliable protein identification in mass spectrometry-based proteomics
    Has, Canan
    Mungan, Mehmet Direnc
    Ciftci, Cansu
    Allmer, Jens
    AMINO ACIDS, 2016, 48 (02) : 623 - 624
  • [5] Combining mass spectrometry with database interrogation strategies in proteomics
    Liska, AJ
    Shevchenko, A
    TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2003, 22 (05) : 291 - +
  • [6] Correlation between peak capacity and protein sequence coverage in proteomics analysis by liquid chromatography-mass spectrometry/mass spectrometry
    Fairchild, Jacob N.
    Walworth, Matthew J.
    Horvath, Krisztian
    Guiochon, Georges
    JOURNAL OF CHROMATOGRAPHY A, 2010, 1217 (29) : 4779 - 4783
  • [7] SpecDB: A database for storing and managing mass spectrometry proteomics data
    Cannataro, M
    Veltri, P
    FUZZY LOGIC AND APPLICATIONS, 2006, 3849 : 236 - 245
  • [8] Protein and peptide identification: the role of mass spectrometry in proteomics
    Ashcroft, AE
    NATURAL PRODUCT REPORTS, 2003, 20 (02) : 202 - 215
  • [9] Benchmarking mass spectrometry based proteomics algorithms using a simulated database
    Muaaz Gul Awan
    Abdullah Gul Awan
    Fahad Saeed
    Network Modeling Analysis in Health Informatics and Bioinformatics, 2021, 10
  • [10] Mass spectrometry in proteomics: Studies of protein interaction.
    Roepstorff, P
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2005, 229 : U150 - U151