ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes

被引:74
作者
King, Brian R.
Guda, Chittibabu
机构
[1] SUNY Albany, GenNYsis Ctr Excellence Canc Genom, Rensselaer, NY 12144 USA
[2] SUNY Albany, Dept Comp Sci, Albany, NY 12222 USA
[3] SUNY Albany, Dept Epidemiol & Biostat, Rensselaer, NY 12144 USA
关键词
D O I
10.1186/gb-2007-8-5-r68
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
We present a method called ngLOC, an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. A tenfold cross-validation result shows an accuracy of 89% for sequences localized to a single organelle, and 82% for those localized to multiple organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.
引用
收藏
页数:17
相关论文
共 36 条
[1]   Organellar proteomics: turning inventories into insights [J].
Andersen, Jens S. ;
Mann, Matthias .
EMBO REPORTS, 2006, 7 (09) :874-879
[2]   Protein classification based on text document classification techniques [J].
Cheng, BYM ;
Carbonell, JG ;
Klein-Seetharaman, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 58 (04) :955-970
[3]   Protein subcellular location prediction [J].
Chou, KC ;
Elrod, DW .
PROTEIN ENGINEERING, 1999, 12 (02) :107-118
[4]  
Doennes Pierre, 2004, Genomics Proteomics & Bioinformatics, V2, P209
[5]  
Drish J., 2001, OBTAINING CALIBRATED
[6]  
Duda R.O., 2001, Pattern Classification, V2nd
[7]   Predicting subcellular localization of proteins based on their N-terminal amino acid sequence [J].
Emanuelsson, O ;
Nielsen, H ;
Brunak, S ;
von Heijne, G .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) :1005-1016
[8]  
Eskin E, 2003, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004, P288
[9]   How independent are the appearances of n-mers in different genomes? [J].
Fofanov, Y ;
Luo, Y ;
Katili, C ;
Wang, J ;
Belosludtsev, Y ;
Powdrill, T ;
Belapurkar, C ;
Fofanov, V ;
Li, TB ;
Chumakov, S ;
Pettitt, BM .
BIOINFORMATICS, 2004, 20 (15) :2421-2428
[10]  
Ganapathiraju M., 2002, P HUM LANG TECHN C H