Prediction of subcellular location of mycobacterial protein using feature selection techniques

被引:42
作者
Lin, Hao [1 ]
Ding, Hui [1 ]
Guo, Feng-Biao [1 ]
Huang, Jian [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Life Sci & Technol, Minist Educ, Key Lab NeuroInformat, Chengdu 610054, Peoples R China
关键词
Protein subcellular localization; Pseudo amino acid composition; Feature selection; Mycobacterium tuberculosis; Reduced amino acids; ENSEMBLE CLASSIFIER; MEMBRANE-PROTEINS; LOCALIZATION; SEQUENCE; PLOC; QSAR; RECOGNITION; NETWORKS;
D O I
10.1007/s11030-009-9205-1
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Mycobacterium tuberculosis is the primary pathogen causing tuberculosis, which is one of the most prevalent infectious diseases. The subcellular location of mycobacterial proteins can provide essential clues for proteins function research and drug discovery. Therefore, it is highly desirable to develop a computational method for fast and reliable prediction of subcellular location of mycobacterial proteins. In this study, we developed a support vector machine (SVM) based method to predict subcellular location of mycobacterial proteins. A total of 444 non-redundant mycobacterial proteins were used to train and test proposed model by using jackknife cross validation. By selecting traditional pseudo amino acid composition (PseAAC) as parameters, the overall accuracy of 83.3% was achieved. Moreover, a feature selection technique was developed to find out an optimal amount of PseAAC for improving predictive performance. The optimal amount of PseAAC improved overall accuracy from 83.3 to 87.2%. In addition, the reduced amino acids in N-terminus and non N-terminus of proteins were combined in models for further improving predictive successful rate. As a result, the maximum overall accuracy of 91.2% was achieved with average accuracy of 79.7%. The proposed model provides highly useful information for further experimental research. The prediction model can be accessed free of charge at http://cobi.uestc.edu.cn/cobi/people/hlin/webserver.
引用
收藏
页码:667 / 671
页数:5
相关论文
共 33 条
[1]   Novel 2D maps and coupling numbers for protein sequences.: The first QSAR study of polygalacturonases;: isolation and prediction of a novel sequence from Psidium guajava']java L. [J].
Agüero-Chapin, GA ;
González-Díaz, H ;
Molina, R ;
Varona-Santos, J ;
Uriarte, E ;
González-Díaz, Y .
FEBS LETTERS, 2006, 580 (03) :723-730
[2]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[3]   Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition [J].
Chen, Ying-Li ;
Li, Qian-Zhong .
JOURNAL OF THEORETICAL BIOLOGY, 2007, 248 (02) :377-381
[4]   Prediction of the subcellular location of apoptosis proteins [J].
Chen, Ying-Li ;
Li, Qian-Zhong .
JOURNAL OF THEORETICAL BIOLOGY, 2007, 245 (04) :775-783
[5]   PREDICTION OF PROTEIN STRUCTURAL CLASSES [J].
CHOU, KC ;
ZHANG, CT .
CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (04) :275-349
[6]   Prediction of protein cellular attributes using pseudo-amino acid composition [J].
Chou, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :246-255
[7]   A key driving force in determination of protein structural classes [J].
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 1999, 264 (01) :216-224
[8]   Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
NATURE PROTOCOLS, 2008, 3 (02) :153-162
[9]   Recent progress in protein subcellular location prediction [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
ANALYTICAL BIOCHEMISTRY, 2007, 370 (01) :1-16
[10]   Predicting subcellular localization of proteins based on their N-terminal amino acid sequence [J].
Emanuelsson, O ;
Nielsen, H ;
Brunak, S ;
von Heijne, G .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) :1005-1016