Subcellular localization prediction with new protein encoding schemes

被引:11
作者
Ogul, Hasan [1 ]
Mumcuoglu, Erkan U.
机构
[1] Baskent Univ, Dept Comp Engn, TR-06490 Ankara, Turkey
[2] Middle E Tech Univ, Dept Hlth Informat, Inst Informat, TR-06531 Ankara, Turkey
关键词
n-peptide composition; probabilistic suffix tree; subcellular localization; support vector machines;
D O I
10.1109/TCBB.2007.070209
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Subcellular localization is one of the key properties in functional annotation of proteins. Support vector machines (SVMs) have been widely used for automated prediction of subcellular localizations. Existing methods differ in the protein encoding schemes used. In this study, we present two methods for protein encoding to be used for SVM-based subcellular localization prediction: n-peptide compositions with reduced amino acid alphabets for larger values of n and pairwise sequence similarity scores based on whole sequence and N-terminal sequence. We tested the methods on a common benchmarking data set that consists of 2,427 eukaryotic proteins with four localization sites. As a result of 5-fold cross-validation tests, the encoding with n-peptide compositions provided the accuracies of 84.5, 88.9, 66.3, and 94.3 percent for cytoplasmic, extracellular, mitochondrial, and nuclear proteins, where the overall accuracy was 87.1 percent. The second method provided 83.6, 87.7, 87.9, and 90.5 percent accuracies for individual locations and 87.8 percent overall accuracy. A hybrid system, which we called PredLOC, makes a final decision based on the results of the two presented methods which achieved an overall accuracy of 91.3 percent, which is better than the achievements of many of the existing methods. The new system also outperformed the recent methods in the experiments conducted on a new-unique SWISSPROT test set.
引用
收藏
页码:227 / 232
页数:6
相关论文
共 30 条
[1]   Implicit motif distribution based hybrid computational kernel for sequence classification [J].
Atalay, V ;
Cetin-Atalay, R .
BIOINFORMATICS, 2005, 21 (08) :1429-1436
[2]   Variations on probabilistic suffix trees: statistical modeling and prediction of protein families [J].
Bejerano, G ;
Yona, G .
BIOINFORMATICS, 2001, 17 (01) :23-43
[3]   PSLpred: prediction of subcellular localization of bacterial proteins [J].
Bhasin, M ;
Garg, A ;
Raghava, GPS .
BIOINFORMATICS, 2005, 21 (10) :2522-2524
[4]   ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST [J].
Bhasin, M ;
Raghava, GPS .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W414-W419
[5]   Finding nuclear localization signals [J].
Cokol, M ;
Nair, R ;
Rost, B .
EMBO REPORTS, 2000, 1 (05) :411-415
[6]   Predicting subcellular localization of proteins based on their N-terminal amino acid sequence [J].
Emanuelsson, O ;
Nielsen, H ;
Brunak, S ;
von Heijne, G .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) :1005-1016
[7]   PSORTb v.2.0: Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis [J].
Gardy, JL ;
Laird, MR ;
Chen, F ;
Rey, S ;
Walsh, CJ ;
Ester, M ;
Brinkman, FSL .
BIOINFORMATICS, 2005, 21 (05) :617-623
[8]   PSORT-B:: improving protein subcellular localization prediction for Gram-negative bacteria [J].
Gardy, JL ;
Spencer, C ;
Wang, K ;
Ester, M ;
Tusnády, GE ;
Simon, I ;
Hua, S ;
deFays, K ;
Lambert, C ;
Nakai, K ;
Brinkman, FSL .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3613-3617
[9]   Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search [J].
Garg, A ;
Bhasin, M ;
Raghava, GPS .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2005, 280 (15) :14427-14432
[10]   Support vector machine approach for protein subcellular localization prediction [J].
Hua, SJ ;
Sun, ZR .
BIOINFORMATICS, 2001, 17 (08) :721-728