Predicting subcellular location of proteins using integrated-algorithm method

被引:21
作者
Cai, Yu-Dong [1 ,2 ]
Lu, Lin [3 ]
Chen, Lei [4 ]
He, Jian-Feng [3 ]
机构
[1] Shanghai Univ, Inst Syst Biol, Shanghai 200244, Peoples R China
[2] Fudan Univ, Ctr Computat Syst Biol, Shanghai 200433, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Biomed Engn, Shanghai 200040, Peoples R China
[4] E China Normal Univ, Shanghai Key Lab Trustworthy Comp, Shanghai 200062, Peoples R China
关键词
mRMR (Minimum redundancy maximum relevance); Subcellular localization; Amino acid composition; Integrated-algorithm method; Weka; AMINO-ACID-COMPOSITION; LOCALIZATION; SEQUENCE;
D O I
10.1007/s11030-009-9182-4
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein's subcellular location, which indicates where a protein resides in a cell, is an important characteristic of protein. Correctly assigning proteins to their subcellular locations would be of great help to the prediction of proteins' function, genome annotation, and drug design. Yet, in spite of great technical advance in the past decades, it is still time-consuming and laborious to experimentally determine protein subcellular locations on a high throughput scale. Hence, four integrated-algorithm methods were developed to fulfill such high throughput prediction in this article. Two data sets taken from the literature (Chou and Elrod, Protein Eng 12:107-118, 1999) were used as training set and test set, which consisted of 2,391 and 2,598 proteins, respectively. Amino acid composition was applied to represent the protein sequences. The jackknife cross-validation was used to test the training set. The final best integrated-algorithm predictor was constructed by integrating 10 algorithms in Weka (a software tool for tackling data mining tasks, http://www.cs.waikato.ac.nz/ml/weka/) based on an mRMR(Minimum Redundancy Maximum Relevance, http://research.janelia.org/peng/proj/mRMR/) method. It can achieve correct rate of 77.83 and 80.56% for the training set and test set, respectively, which is better than all of the 60 algorithms collected in Weka. This predicting software is available upon request.
引用
收藏
页码:551 / 558
页数:8
相关论文
共 13 条
[1]   Predicting membrane protein type by functional domain composition and pseudo-amino acid composition [J].
Cai, YD ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2006, 238 (02) :395-400
[2]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[3]   Protein subcellular location prediction [J].
Chou, KC ;
Elrod, DW .
PROTEIN ENGINEERING, 1999, 12 (02) :107-118
[4]   Wanted: subcellular localization of proteins based on sequence [J].
Eisenhaber, F ;
Bork, P .
TRENDS IN CELL BIOLOGY, 1998, 8 (04) :169-170
[5]  
Frank E., 2005, DATA MINING PRACTICA
[6]   BioWeka - extending the Weka framework for bioinformatics [J].
Gewehr, Jan E. ;
Szugat, Martin ;
Zimmer, Ralf .
BIOINFORMATICS, 2007, 23 (05) :651-653
[7]   2D-RNA-coupling numbers:: A new computational chemistry approach to link secondary structure topology with biological function [J].
Gonzalez-Diaz, Humberto ;
Agueero-Chapin, Guillermin ;
Varona, Javier ;
Molina, Reinaldo ;
Delogu, Giovanna ;
Santana, Lourdes ;
Uriarte, Eugenio ;
Podda, Gianni .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2007, 28 (06) :1049-1056
[8]   Support vector machine approach for protein subcellular localization prediction [J].
Hua, SJ ;
Sun, ZR .
BIOINFORMATICS, 2001, 17 (08) :721-728
[9]   Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices [J].
Munteanu, Cristian Robert ;
Gonzalez-Diaz, Humberto ;
Magalhaes, Alexandre L. .
JOURNAL OF THEORETICAL BIOLOGY, 2008, 254 (02) :476-482
[10]   Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy [J].
Peng, HC ;
Long, FH ;
Ding, C .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (08) :1226-1238