A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites

被引:232
作者
Xiao, Xuan [1 ,2 ]
Wu, Zhi-Cheng [1 ]
Chou, Kuo-Chen [2 ]
机构
[1] Jing De Zhen Ceram Inst, Dept Comp, Jing De Zhen, Peoples R China
[2] Gordon Life Sci Inst, San Diego, CA USA
基金
中国国家自然科学基金;
关键词
AMINO-ACID-COMPOSITION; NEURAL DISCRIMINANT MODEL; TURN TYPES PREDICTION; LOCATION PREDICTION; STRUCTURAL CLASSES; GENE ONTOLOGY; SORTING SIGNALS; ANNOTATION; REPRESENTATION; STATISTICS;
D O I
10.1371/journal.pone.0020592
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Prediction of protein subcellular localization is a challenging problem, particularly when the system concerned contains both singleplex and multiplex proteins. In this paper, by introducing the "multi-label scale" and hybridizing the information of gene ontology with the sequential evolution information, a novel predictor called iLoc-Gneg is developed for predicting the subcellular localization of Gram-positive bacterial proteins with both single-location and multiple-location sites. For facilitating comparison, the same stringent benchmark dataset used to estimate the accuracy of Gneg-mPLoc was adopted to demonstrate the power of iLoc-Gneg. The dataset contains 1,392 Gram-negative bacterial proteins classified into the following eight locations: (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane, and (8) periplasm. Of the 1,392 proteins, 1,328 are each with only one subcellular location and the other 64 are each with two subcellular locations, but none of the proteins included has >= 25% pairwise sequence identity to any other in a same subset (subcellular location). It was observed that the overall success rate by jackknife test on such a stringent benchmark dataset by iLoc-Gneg was over 91%, which is about 6% higher than that by Gneg-mPLoc. As a user-friendly web-server, iLoc-Gneg is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc-Gneg. Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user's convenience, the iLoc-Gneg web-server also has the function to accept the batch job submission, which is not available in the existing version of Gneg-mPLoc web-server. It is anticipated that iLoc-Gneg may become a useful high throughput tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development.
引用
收藏
页数:10
相关论文
共 56 条
[1]  
Altschul SE, 1997, THEORETICAL AND COMPUTATIONAL METHODS IN GENOME RESEARCH, P1
[2]  
[Anonymous], NAT SCI
[3]  
[Anonymous], 1936, P NATL I SCI INDIA
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   Prediction of Protein Subcellular Locations with Feature Selection and Analysis [J].
Cai, Yudong ;
He, Jianfeng ;
Li, Xinlei ;
Feng, Kaiyan ;
Lu, Lin ;
Feng, Kairui ;
Kong, Xiangyin ;
Lu, Wencong .
PROTEIN AND PEPTIDE LETTERS, 2010, 17 (04) :464-472
[6]   The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology [J].
Camon, E ;
Magrane, M ;
Barrell, D ;
Lee, V ;
Dimmer, E ;
Maslen, J ;
Binns, D ;
Harte, N ;
Lopez, R ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D262-D266
[7]   The gene ontology annotation (GOA) project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro [J].
Camon, E ;
Magrane, M ;
Barrell, D ;
Binns, D ;
Fleischmann, W ;
Kersey, P ;
Mulder, N ;
Oinn, T ;
Maslen, J ;
Cox, A ;
Apweiler, R .
GENOME RESEARCH, 2003, 13 (04) :662-672
[8]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[9]   Predicting protein structural class based on multi-features fusion [J].
Chen, Chao ;
Chen, Li-Xuan ;
Zou, Xiao-Yong ;
Cai, Pei-Xiang .
JOURNAL OF THEORETICAL BIOLOGY, 2008, 253 (02) :388-392
[10]   Prediction of Protein Secondary Structure Content by Using the Concept of Chou's Pseudo Amino Acid Composition and Support Vector Machine [J].
Chen, Chao ;
Chen, Lixuan ;
Zou, Xiaoyong ;
Cai, Peixiang .
PROTEIN AND PEPTIDE LETTERS, 2009, 16 (01) :27-31