Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units

被引:33
作者
Bittremieux, Wout [1 ,2 ,3 ]
Laukens, Kris [1 ,2 ]
Noble, William Stafford [3 ,4 ]
机构
[1] Univ Antwerp, Dept Math & Comp Sci, B-2020 Antwerp, Belgium
[2] Biomed Informat Network Antwerpen Biomina, B-2020 Antwerp, Belgium
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[4] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
mass spectrometry; proteomics; open modification searching; spectral library; post-translational modification; approximate nearest neighbor indexing; graphics processing unit; feature hashing; PEPTIDE IDENTIFICATION; SPECTROMETRY; DATABASE; PROTEOME;
D O I
10.1021/acs.jproteome.9b00291
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Open modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides. We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. On the basis of these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome. ANN-SoLo is implemented in Python and C++.
引用
收藏
页码:3792 / 3799
页数:8
相关论文
共 37 条
[1]  
Aebersold R, 2018, NAT CHEM BIOL, V14, P206, DOI [10.1038/NCHEMBIO.2576, 10.1038/nchembio.2576]
[2]   Unrestricted identification of modified proteins using MS/MS [J].
Ahrne, Erik ;
Mueller, Markus ;
Lisacek, Frederique .
PROTEOMICS, 2010, 10 (04) :671-686
[3]  
[Anonymous], 2009, P 26 ANN INT C MACHI, DOI DOI 10.1145/1553374.1553516
[4]   ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms [J].
Aumueller, Martin ;
Bernhardsson, Erik ;
Faithfull, Alexander .
SIMILARITY SEARCH AND APPLICATIONS, SISAP 2017, 2017, 10609 :34-49
[5]   DeltaMass: Automated Detection and Visualization of Mass Shifts in Proteomic Open-Search Results [J].
Avtonomov, Dmitry M. ;
Kong, Andy ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2019, 18 (02) :715-720
[6]   Fast Parallel Tandem Mass Spectral Library Searching Using GPU Hardware Acceleration [J].
Baumgardner, Lydia Ashleigh ;
Shanmugam, Avinash Kumar ;
Lam, Henry ;
Eng, Jimmy K. ;
Martin, Daniel B. .
JOURNAL OF PROTEOME RESEARCH, 2011, 10 (06) :2882-2888
[7]   Cython: The Best of Both Worlds [J].
Behnel, Stefan ;
Bradshaw, Robert ;
Citro, Craig ;
Dalcin, Lisandro ;
Seljebotn, Dag Sverre ;
Smith, Kurt .
COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (02) :31-39
[8]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[9]  
Bittremieux W., spectrum_utils: A Python package for mass spectrometry data processing and visualization, DOI [10.1101/725036, DOI 10.1101/725036]
[10]   Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing [J].
Bittremieux, Wout ;
Meysman, Pieter ;
Noble, William Stafford ;
Laukens, Kris .
JOURNAL OF PROTEOME RESEARCH, 2018, 17 (10) :3463-3474