Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches

被引:76
作者
Sahu, Sitanshu S. [1 ]
Loaiza, Cristian D. [2 ]
Kaundal, Rakesh [2 ,3 ]
机构
[1] Birla Inst Technol, Dept Elect & Commun Engn, Ranchi 835215, Bihar, India
[2] Utah State Univ, Coll Agr & Appl Sci, Ctr Integrated BioSyst, Dept Plants Soils & Climate, Logan, UT 84322 USA
[3] Utah State Univ, Ctr Integrated BioSyst, Bioinformat Facil, Logan, UT 84322 USA
关键词
Artificial intelligence; machine learning; multi-location; prediction tool; protein science; subcellular localization web server; WEB SERVER; CLASSIFIER; LOCATION; SEQUENCE; ATTRIBUTES; VIRUS;
D O I
10.1093/aobpla/plz068
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The subcellular localization of proteins is very important for characterizing its function in a cell. Accurate prediction of the subcellular locations in computational paradigm has been an active area of interest. Most of the work has been focused on single localization prediction. Only few studies have discussed the multi-target localization, but have not achieved good accuracy so far; in plant sciences, very limited work has been done. Here we report the development of a novel tool Plant-mSubP, which is based on integrated machine learning approaches to efficiently predict the subcellular localizations in plant proteomes. The proposed approach predicts with high accuracy 11 single localizations and three dual locations of plant cell. Several hybrid features based on composition and physicochemical properties of a protein such as amino acid composition, pseudo amino acid composition, auto-correlation descriptors, quasi-sequence-order descriptors and hybrid features are used to represent the protein. The performance of the proposed method has been assessed through a training set as well as an independent test set. Using the hybrid feature of the pseudo amino acid composition, N-Center-C terminal amino acid composition and the dipeptide composition (PseAAC-NCC-DIPEP), an overall accuracy of 81.97 %, 84.75 % and 87.88 % is achieved on the training data set of proteins containing the single-label, single- and dual-label combined, and dual-label proteins, respectively. When tested on the independent data, an accuracy of 64.36 %, 64.84 % and 81.08 % is achieved on the single-label, single- and dual-label, and dual-label proteins, respectively. The prediction models have been implemented on a web server available at http://bioinfo.usu.edu/Plant-mSubP/. The results indicate that the proposed approach is comparable to the existing methods in single localization prediction and outperforms all other existing tools when compared for dual-label proteins. The prediction tool will be a useful resource for better annotation of various plant proteomes.
引用
收藏
页数:10
相关论文
共 53 条
[31]   iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins [J].
Lin, Wei-Zhong ;
Fang, Jian-An ;
Xiao, Xuan ;
Chou, Kuo-Chen .
MOLECULAR BIOSYSTEMS, 2013, 9 (04) :634-644
[32]   Plant organellar protein targeting: a traffic plan still under construction [J].
Mackenzie, SA .
TRENDS IN CELL BIOLOGY, 2005, 15 (10) :548-554
[33]   PairProSVM: Protein subcellular localization based on local pairwise profile alignment and SVM [J].
Mak, Man-Wai ;
Guo, Jian ;
Kung, Sun-Yuan .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2008, 5 (03) :416-422
[34]   Multi-Label Multi-Kernel Transfer Learning for Human Protein Subcellular Localization [J].
Mei, Suyu .
PLOS ONE, 2012, 7 (06)
[35]   A novel approach for protein subcellular location prediction using amino acid exposure [J].
Mer, Arvind Singh ;
Andrade-Navarro, Miguel A. .
BMC BIOINFORMATICS, 2013, 14
[36]   Prediction of dual protein targeting to plant organelles [J].
Mitschke, Jan ;
Fuss, Janina ;
Blum, Torsten ;
Hoeglund, Annette ;
Reski, Ralf ;
Kohlbacher, Oliver ;
Rensing, Stefan A. .
NEW PHYTOLOGIST, 2009, 183 (01) :224-236
[37]   DISCRIMINATION OF INTRACELLULAR AND EXTRACELLULAR PROTEINS USING AMINO-ACID-COMPOSITION AND RESIDUE-PAIR FREQUENCIES [J].
NAKASHIMA, H ;
NISHIKAWA, K .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 238 (01) :54-61
[38]  
Nielsen H, 1997, Int J Neural Syst, V8, P581, DOI 10.1142/S0129065797000537
[39]   Protein localization as a principal feature of the etiology and comorbidity of genetic diseases [J].
Park, Solip ;
Yang, Jae-Seong ;
Shin, Young-Eun ;
Park, Juyong ;
Jang, Sung Key ;
Kim, Sanguk .
MOLECULAR SYSTEMS BIOLOGY, 2011, 7
[40]   Dual targeting to mitochondria and chloroplasts [J].
Peeters, N ;
Small, I .
BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH, 2001, 1541 (1-2) :54-63