Support vector machine approach for protein subcellular localization prediction

被引:703
作者
Hua, SJ [1 ]
Sun, ZR [1 ]
机构
[1] Tsing Hua Univ, Dept Biol Sci & Biotechnol, Inst Bioinformat, State Key Lab Biomembrane & Membrane Biotechnol, Beijing 100084, Peoples R China
关键词
D O I
10.1093/bioinformatics/17.8.721
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. Results: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic, organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals.
引用
收藏
页码:721 / 728
页数:8
相关论文
共 33 条
[1]   Adaptation of protein surfaces to subcellular location [J].
Andrade, MA ;
O'Donoghue, SI ;
Rost, B .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 276 (02) :517-525
[2]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[3]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[4]   Protein subcellular location prediction [J].
Chou, KC ;
Elrod, DW .
PROTEIN ENGINEERING, 1999, 12 (02) :107-118
[5]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[6]   A Bayesian system integrating expression data with sequence patterns for localizing proteins: Comprehensive application to the yeast genome [J].
Drawid, A ;
Gerstein, M .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 301 (04) :1059-1075
[7]   Support vector machines for spam categorization [J].
Drucker, H ;
Wu, DH ;
Vapnik, VN .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1048-1054
[8]   Wanted: subcellular localization of proteins based on sequence [J].
Eisenhaber, F ;
Bork, P .
TRENDS IN CELL BIOLOGY, 1998, 8 (04) :169-170
[9]   Predicting subcellular localization of proteins based on their N-terminal amino acid sequence [J].
Emanuelsson, O ;
Nielsen, H ;
Brunak, S ;
von Heijne, G .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) :1005-1016
[10]   Starts of bacterial genes: estimating the reliability of computer predictions [J].
Frishman, D ;
Mironov, A ;
Gelfand, M .
GENE, 1999, 234 (02) :257-265