Speeding up tandem mass spectrometry-based database searching by longest common prefix

被引：6

作者：

Zhou, Chen ^{[1
,2
,3
]}

Chi, Hao ^{[1
,2
,3
]}

Wang, Le-Heng ^{[1
,2
]}

Li, You ^{[1
,2
]}

Wu, Yan-Jie ^{[1
,2
,3
]}

Fu, Yan ^{[1
,2
]}

Sun, Rui-Xiang ^{[1
,2
]}

He, Si-Min ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China

[2] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China

[3] Chinese Acad Sci, Grad Univ, Beijing 100049, Peoples R China

来源：

BMC BIOINFORMATICS | 2010年 / 11卷

关键词：

PROTEIN IDENTIFICATION; PEPTIDE IDENTIFICATION; PFIND;

D O I：

10.1186/1471-2105-11-577

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi-or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use. Results: We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions. Conclusions: The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm

引用

页数：11

共 50 条

[1] Speeding up tandem mass spectrometry-based database searching by longest common prefix
Chen Zhou
Hao Chi
Le-Heng Wang
You Li
Yan-Jie Wu
Yan Fu
Rui-Xiang Sun
Si-Min He
BMC Bioinformatics, 11
[2] Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing
Li, You
Chi, Hao
Wang, Le-Heng
Wang, Hai-Peng
Fu, Yan
Yuan, Zuo-Fei
Li, Su-Jun
Liu, Yan-Sheng
Sun, Rui-Xiang
Zeng, Rong
He, Si-Min
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2010, 24 (06) : 807 - 814
[3] Mass spectrometry-based protein identification by integrating de novo sequencing with database searching
Wang, Penghao
Wilson, Susan R.
BMC BIOINFORMATICS, 2013, 14
[4] Mass spectrometry-based protein identification by integrating de novo sequencing with database searching
Penghao Wang
Susan R Wilson
BMC Bioinformatics, 14
[5] Database searching for structural identification of metabolites in complex biofluids for mass spectrometry-based metabonomics
Kertesz, Tzipporah M.
Hill, Dennis W.
Albaugh, Daniel R.
Hall, Lowell H.
Hall, L. Mark
Grant, David F.
BIOANALYSIS, 2009, 1 (09) : 1627 - 1643
[6] Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search
Dutta, Debojyoti
Chen, Ting
BIOINFORMATICS, 2007, 23 (05) : 612 - 618
[7] Database for Mass Spectrometry-Based Plant Metabolomics
Bais, Preeti
Moon, Stephanie
He, Kun
Leitao, Ricardo
Dreher, Kate
Walk, Tom
Sucaet, Yves
Barkan, Lenore
Wohlgemuth, Gert
Roth, Mary
Wurtele, Eve
Dixon, Philip
Fiehn, Oliver
Lange, Bernd
Shulaev, Vladimir
Sumner, Lloyd
Welti, Ruth
Rhee, Seung
Nikolau, Basil
Dickerson, Julie
IN VITRO CELLULAR & DEVELOPMENTAL BIOLOGY-ANIMAL, 2010, 46 : S7 - S7
[8] PRiSM: A prototype for exhaustive, restriction-free database searching for mass spectrometry-based identification
Van Houtven, Joris
Boonen, Kurt
Baggerman, Geert
Askenazi, Manor
Laukens, Kris
Hooyberghs, Jef
Valkenborg, Dirk
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2020,
[9] Database Searching in Mass Spectrometry Based Proteomics
Kertesz-Farkas, Attila
Reiz, Beata
Myers, Michael P.
Pongor, Sandor
CURRENT BIOINFORMATICS, 2012, 7 (02) : 221 - 230
[10] A mass accuracy sensitive probability based scoring algorithm for database searching of tandem mass spectrometry data
Hua Xu
Michael A Freitas
BMC Bioinformatics, 8

← 1 2 3 4 5 →