Predicting protein subcellular location by fusing multiple classifiers

被引:97
作者
Chou, Kuo-Chen
Shen, Hong-Bin
机构
[1] Gordon Life Sci Inst, San Diego, CA 92130 USA
[2] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200030, Peoples R China
关键词
cellular networking; subcellular compartment; ensemble classifier; fusion; voting; covariant discriminant algorithm; amphiphilic pseudo amino acid composition;
D O I
10.1002/jcb.20879
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
One of the fundamental goals in cell biology and proteomics is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. Knowledge of subcellular locations of proteins can provide key hints for revealing their functions and understanding how they interact with each other in cellular networking. Unfortunately, it is both time-consuming and expensive to determine the localization of an uncharacterized protein in a living cell purely based on experiments. With the avalanche of newly found protein sequences emerging in the post genomic era, we are facing a critical challenge, that is, how to develop an automated method to fast and reliably identify their subcellular locations so as to be able to timely use them for basic research and drug discovery. In view of this, an ensemble classifier was developed by the approach of fusing many basic individual classifiers through a voting system. Each of these basic classifiers was trained in a different dimension of the amphiphilic pseudo amino acid composition (Chou [2005] Bioinformatics 21: 10-19). As a demonstration, predictions were performed with the fusion classifier for proteins among the following 14 localizations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cytoplasm, (5) cytoskeleton, (6) endoplasmic reticulum, (7) extracellular, (8) Golgi apparatus, (9) lysosome, (10) mitochondria, (11) nucleus, (12) peroxisome, (13) plasma membrane, and (14) vacuole. The overall success rates thus obtained via the resubstitution test, jackknife test, and independent dataset test were all significantly higher than those by the existing classifiers. It is anticipated that the novel ensemble classifier may also become a very useful vehicle in classifying other attributes of proteins according to their sequences, such as membrane protein type, enzyme family/sub-family, G-protein coupled receptor (GPCR) type, and structural class, among many others. The fusion ensemble classifier will be available at www.pami.sjtu.edu.cn/people/hbshen.
引用
收藏
页码:517 / 527
页数:11
相关论文
共 71 条
[1]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL [J].
Bairoch, A ;
Apweller, R .
NUCLEIC ACIDS RESEARCH, 1997, 25 (01) :31-36
[2]   Predicting membrane protein type by functional domain composition and pseudo-amino acid composition [J].
Cai, YD ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2006, 238 (02) :395-400
[3]   Predicting enzyme subclass by functional domain composition and pseudo amino acid composition [J].
Cai, YD ;
Chou, KC .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (03) :967-971
[4]   Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition [J].
Cai, YD ;
Zhou, GP ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2005, 234 (01) :145-149
[5]   Predicting 22 protein localizations in budding yeast [J].
Cai, YD ;
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2004, 323 (02) :425-428
[6]   Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition [J].
Cai, YD ;
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2003, 305 (02) :407-411
[7]   Is it a paradox or misinterpretation? [J].
Cai, YD .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :336-338
[8]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[9]  
CHOU KC, 1984, J AM CHEM SOC, V106, P3161, DOI 10.1021/ja00323a017
[10]   Predicting protein-protein interactions from sequences in a hybridization space [J].
Chou, KC ;
Cai, YD .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (02) :316-322