Cloud Parallel Processing of Tandem Mass Spectrometry Based Proteomics Data

被引:26
|
作者
Mohammed, Yassene [1 ,2 ,3 ]
Mostovenko, Ekaterina [1 ]
Henneman, Alex A. [1 ]
Marissen, Rob J. [1 ]
Deelder, Andre M. [1 ]
Palmblad, Magnus [1 ]
机构
[1] Leiden Univ, Dept Parasitol, Med Ctr, Biomol Mass Spectrometry Unit, NL-2300 RA Leiden, Netherlands
[2] Leibniz Univ Hannover, Distributed Comp Secur Grp, D-30167 Hannover, Germany
[3] Leibniz Univ Hannover, L3S, D-30167 Hannover, Germany
关键词
proteomics; mass spectrometry; scientific workflow; data decomposition; PEPTIDE IDENTIFICATION; SPECTRA; MAPREDUCE; SEQUENCES; XTANDEM; MS/MS; ETD;
D O I
10.1021/pr300561q
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
引用
收藏
页码:5101 / 5108
页数:8
相关论文
共 50 条
  • [31] Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics
    Bin Ma
    Journal of Computer Science and Technology, 2010, 25 : 107 - 123
  • [32] Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics
    马斌
    Journal of Computer Science & Technology, 2010, 25 (01) : 107 - 123
  • [33] On the preprocessing of mass spectrometry proteomics data
    Cannataro, M.
    Guzzi, P. H.
    Mazza, T.
    Tradigo, G.
    Veltri, R.
    NEURAL NETS, 2006, 3931 : 127 - 131
  • [34] Mass Spectrometry for Proteomics-Based Investigation
    Woods, Alisa G.
    Sokolowska, Izabela
    Wetie, Armand G. Ngounou
    Channaveerappa, Devika
    Dupree, Emmalyn J.
    Jayathirtha, Madhuri
    Aslebagh, Roshanak
    Wormwood, Kelly L.
    Darie, Costel C.
    ADVANCEMENTS OF MASS SPECTROMETRY IN BIOMEDICAL RESEARCH, 2ND EDITION, 2019, 1140 : 1 - 26
  • [35] Quantitation in Mass-Spectrometry-Based Proteomics
    Schulze, Waltraud X.
    Usadel, Bjoern
    ANNUAL REVIEW OF PLANT BIOLOGY, VOL 61, 2010, 61 : 491 - 516
  • [36] Plasma/Serum Proteomics based on Mass Spectrometry
    Zhu, Yiying
    PROTEIN AND PEPTIDE LETTERS, 2024, 31 (03) : 192 - 208
  • [37] LBE: A Computational Load Balancing Algorithm for Speeding up Parallel Peptide Search in Mass-Spectrometry based Proteomics
    Haseeb, Muhammad
    Afzali, Fatima
    Saeed, Fahad
    2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 191 - 198
  • [38] Facilitating interpretation of mass spectrometry-based proteomics data
    Hunter, JC
    Adkins, JN
    Miller, JH
    Pounds, JG
    METMBS '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2004, : 471 - 477
  • [39] Processing strategies and software solutions for data-independent acquisition in mass spectrometry
    Bilbao, Aivett
    Varesio, Emmanuel
    Luban, Jeremy
    Strambio-De-Castillia, Caterina
    Hopfgartner, Gerard
    Mueller, Markus
    Lisacek, Frederique
    PROTEOMICS, 2015, 15 (5-6) : 964 - 980
  • [40] Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
    Reiter, Lukas
    Claassen, Manfred
    Schrimpf, Sabine P.
    Jovanovic, Marko
    Schmidt, Alexander
    Buhmann, Joachim M.
    Hengartner, Michael O.
    Aebersold, Ruedi
    MOLECULAR & CELLULAR PROTEOMICS, 2009, 8 (11) : 2405 - 2417