Speeding up tandem mass spectrometry-based database searching by longest common prefix

被引:6
|
作者
Zhou, Chen [1 ,2 ,3 ]
Chi, Hao [1 ,2 ,3 ]
Wang, Le-Heng [1 ,2 ]
Li, You [1 ,2 ]
Wu, Yan-Jie [1 ,2 ,3 ]
Fu, Yan [1 ,2 ]
Sun, Rui-Xiang [1 ,2 ]
He, Si-Min [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
[3] Chinese Acad Sci, Grad Univ, Beijing 100049, Peoples R China
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
PROTEIN IDENTIFICATION; PEPTIDE IDENTIFICATION; PFIND;
D O I
10.1186/1471-2105-11-577
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi-or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use. Results: We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions. Conclusions: The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Speeding up tandem mass spectrometry-based database searching by longest common prefix
    Chen Zhou
    Hao Chi
    Le-Heng Wang
    You Li
    Yan-Jie Wu
    Yan Fu
    Rui-Xiang Sun
    Si-Min He
    BMC Bioinformatics, 11
  • [2] Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing
    Li, You
    Chi, Hao
    Wang, Le-Heng
    Wang, Hai-Peng
    Fu, Yan
    Yuan, Zuo-Fei
    Li, Su-Jun
    Liu, Yan-Sheng
    Sun, Rui-Xiang
    Zeng, Rong
    He, Si-Min
    RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2010, 24 (06) : 807 - 814
  • [3] Mass spectrometry-based protein identification by integrating de novo sequencing with database searching
    Wang, Penghao
    Wilson, Susan R.
    BMC BIOINFORMATICS, 2013, 14
  • [4] Mass spectrometry-based protein identification by integrating de novo sequencing with database searching
    Penghao Wang
    Susan R Wilson
    BMC Bioinformatics, 14
  • [5] Database searching for structural identification of metabolites in complex biofluids for mass spectrometry-based metabonomics
    Kertesz, Tzipporah M.
    Hill, Dennis W.
    Albaugh, Daniel R.
    Hall, Lowell H.
    Hall, L. Mark
    Grant, David F.
    BIOANALYSIS, 2009, 1 (09) : 1627 - 1643
  • [6] Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search
    Dutta, Debojyoti
    Chen, Ting
    BIOINFORMATICS, 2007, 23 (05) : 612 - 618
  • [7] Database for Mass Spectrometry-Based Plant Metabolomics
    Bais, Preeti
    Moon, Stephanie
    He, Kun
    Leitao, Ricardo
    Dreher, Kate
    Walk, Tom
    Sucaet, Yves
    Barkan, Lenore
    Wohlgemuth, Gert
    Roth, Mary
    Wurtele, Eve
    Dixon, Philip
    Fiehn, Oliver
    Lange, Bernd
    Shulaev, Vladimir
    Sumner, Lloyd
    Welti, Ruth
    Rhee, Seung
    Nikolau, Basil
    Dickerson, Julie
    IN VITRO CELLULAR & DEVELOPMENTAL BIOLOGY-ANIMAL, 2010, 46 : S7 - S7
  • [8] PRiSM: A prototype for exhaustive, restriction-free database searching for mass spectrometry-based identification
    Van Houtven, Joris
    Boonen, Kurt
    Baggerman, Geert
    Askenazi, Manor
    Laukens, Kris
    Hooyberghs, Jef
    Valkenborg, Dirk
    RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2020,
  • [9] Database Searching in Mass Spectrometry Based Proteomics
    Kertesz-Farkas, Attila
    Reiz, Beata
    Myers, Michael P.
    Pongor, Sandor
    CURRENT BIOINFORMATICS, 2012, 7 (02) : 221 - 230
  • [10] A mass accuracy sensitive probability based scoring algorithm for database searching of tandem mass spectrometry data
    Hua Xu
    Michael A Freitas
    BMC Bioinformatics, 8