Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data

被引:1
作者
Bob, Konstantin [1 ]
Teschner, David [1 ]
Kemmer, Thomas [1 ]
Gomez-Zepeda, David [2 ,3 ]
Tenzer, Stefan [2 ,3 ]
Schmidt, Bertil [1 ]
Hildebrandt, Andreas [1 ]
机构
[1] Johannes Gutenberg Univ Mainz, Inst Comp Sci, D-55128 Mainz, Germany
[2] Johannes Gutenberg Univ Mainz, Inst Immunol, Univ Med Ctr, D-55128 Mainz, Germany
[3] Helmholtz Inst Translat Oncol HITRON Mainz, Immunoprote Unit, D-55131 Mainz, Germany
关键词
Mass spectrometry; Locality-sensitive hashing; Signal processing; PEPTIDE IDENTIFICATION; PROTEOMICS; RANGE;
D O I
10.1186/s12859-022-04833-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties. Results: In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs. Conclusions: Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data. Availability: Generated data and code are available at https://github.com/hildebrand tlab/mzBucket. Raw data is available at https://zenodo.org/record/5036526.
引用
收藏
页数:16
相关论文
共 36 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Mass spectrometry-based proteomics
    Aebersold, R
    Mann, M
    [J]. NATURE, 2003, 422 (6928) : 198 - 207
  • [3] Proteome and proteomics: New technologies, new concepts, and new words
    Anderson, NL
    Anderson, NG
    [J]. ELECTROPHORESIS, 1998, 19 (11) : 1853 - 1861
  • [4] An LC-IMS-MS Platform Providing Increased Dynamic Range for High-Throughput Proteomic Studies
    Baker, Erin Shammel
    Livesay, Eric A.
    Orton, Daniel J.
    Moore, Ronald J.
    Danielson, William F., III
    Prior, David C.
    Ibrahim, Yehia M.
    LaMarche, Brian L.
    Mayampurath, Anoop M.
    Schepmoes, Athena A.
    Hopkins, Derek F.
    Tang, Keqi
    Smith, Richard D.
    Belov, Mikhail E.
    [J]. JOURNAL OF PROTEOME RESEARCH, 2010, 9 (02) : 997 - 1006
  • [5] Bauer C, 2011, METHODS MOL BIOL, V696, P341, DOI 10.1007/978-1-60761-987-1_22
  • [6] Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
    Berlin, Konstantin
    Koren, Sergey
    Chin, Chen-Shan
    Drake, James P.
    Landolin, Jane M.
    Phillippy, Adam M.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (06) : 623 - +
  • [7] Proteomics: quantitative and physical mapping of cellular proteins
    Blackstock, WP
    Weir, MP
    [J]. TRENDS IN BIOTECHNOLOGY, 1999, 17 (03) : 121 - 127
  • [8] Charikar Moses S., 2002, P 34 ANN ACM S THEOR, P380, DOI DOI 10.1145/509907.509965
  • [9] The European Bioinformatics Institute in 2018: tools, infrastructure and training
    Cook, Charles E.
    Lopez, Rodrigo
    Stroe, Oana
    Cochrane, Guy
    Brooksbank, Cath
    Birney, Ewan
    Apweiler, Rolf
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D15 - D22
  • [10] MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification
    Cox, Juergen
    Mann, Matthias
    [J]. NATURE BIOTECHNOLOGY, 2008, 26 (12) : 1367 - 1372