Gram-positive and gram-negative subcellular localization using rotation forest and physicochemical-based features

被引:24
作者
Dehzangi, Abdollah [1 ,2 ]
Sohrabi, Sohrab [3 ]
Heffernan, Rhys [3 ]
Sharma, Alok [1 ,4 ]
Lyons, James [3 ]
Paliwal, Kuldip [3 ]
Sattar, Abdul [1 ,2 ]
机构
[1] Griffith Univ, Inst Integrated & Intelligent Syst, Brisbane, Qld 4111, Australia
[2] Natl ICT Australia NICTA, Brisbane, Qld, Australia
[3] Griffith Univ, Sch Engn, Brisbane, Qld 4111, Australia
[4] Univ S Pacific, Sch Engn, Suva, Fiji
基金
澳大利亚研究理事会;
关键词
FOLD PREDICTION-PROBLEM; AMINO-ACID-COMPOSITION; PROTEINS; ENSEMBLE; CLASSIFIER; LOCATIONS; FUSION; PLOC;
D O I
10.1186/1471-2105-16-S4-S1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The functioning of a protein relies on its location in the cell. Therefore, predicting protein subcellular localization is an important step towards protein function prediction. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve the prediction performance. However, for newly sequenced proteins, the GO is not available. Therefore, for these cases, the prediction performance of GO based methods degrade significantly. Results: In this study, we develop a method to effectively employ physicochemical and evolutionary-based information in the protein sequence. To do this, we propose segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids to tackle Gram-positive and Gram-negative subcellular localization. We explore our proposed feature extraction techniques using 10 attributes that have been experimentally selected among a wide range of physicochemical attributes. Finally by applying the Rotation Forest classification technique to our extracted features, we enhance Gram-positive and Gram-negative subcellular localization accuracies up to 3.4% better than previous studies which used GO for feature extraction. Conclusion: By proposing segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids as well as using Rotation Forest classification technique, we are able to enhance the Gram-positive and Gram-negative subcellular localization prediction accuracies, significantly.
引用
收藏
页数:8
相关论文
共 35 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 2005, DATA MINING
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[5]   Protein subcellular location prediction [J].
Chou, KC ;
Elrod, DW .
PROTEIN ENGINEERING, 1999, 12 (02) :107-118
[6]  
Chou KC, 2010, ENGINEERING, V2
[7]   Large-scale predictions of gram-negative bacterial protein subcellular locations [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (12) :3420-3428
[8]   Some remarks on protein attribute prediction and pseudo amino acid composition [J].
Chou, Kuo-Chen .
JOURNAL OF THEORETICAL BIOLOGY, 2011, 273 (01) :236-247
[9]   Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
PLOS ONE, 2010, 5 (06)
[10]  
Dehzangi Abdollah, 2013, Pattern Recognition in Bioinformatics. 8th IAPR International Conference, PRIB 2013. Proceedings: LNCS 7986, P196, DOI 10.1007/978-3-642-39159-0_18