Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches

被引:76
作者
Sahu, Sitanshu S. [1 ]
Loaiza, Cristian D. [2 ]
Kaundal, Rakesh [2 ,3 ]
机构
[1] Birla Inst Technol, Dept Elect & Commun Engn, Ranchi 835215, Bihar, India
[2] Utah State Univ, Coll Agr & Appl Sci, Ctr Integrated BioSyst, Dept Plants Soils & Climate, Logan, UT 84322 USA
[3] Utah State Univ, Ctr Integrated BioSyst, Bioinformat Facil, Logan, UT 84322 USA
关键词
Artificial intelligence; machine learning; multi-location; prediction tool; protein science; subcellular localization web server; WEB SERVER; CLASSIFIER; LOCATION; SEQUENCE; ATTRIBUTES; VIRUS;
D O I
10.1093/aobpla/plz068
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The subcellular localization of proteins is very important for characterizing its function in a cell. Accurate prediction of the subcellular locations in computational paradigm has been an active area of interest. Most of the work has been focused on single localization prediction. Only few studies have discussed the multi-target localization, but have not achieved good accuracy so far; in plant sciences, very limited work has been done. Here we report the development of a novel tool Plant-mSubP, which is based on integrated machine learning approaches to efficiently predict the subcellular localizations in plant proteomes. The proposed approach predicts with high accuracy 11 single localizations and three dual locations of plant cell. Several hybrid features based on composition and physicochemical properties of a protein such as amino acid composition, pseudo amino acid composition, auto-correlation descriptors, quasi-sequence-order descriptors and hybrid features are used to represent the protein. The performance of the proposed method has been assessed through a training set as well as an independent test set. Using the hybrid feature of the pseudo amino acid composition, N-Center-C terminal amino acid composition and the dipeptide composition (PseAAC-NCC-DIPEP), an overall accuracy of 81.97 %, 84.75 % and 87.88 % is achieved on the training data set of proteins containing the single-label, single- and dual-label combined, and dual-label proteins, respectively. When tested on the independent data, an accuracy of 64.36 %, 64.84 % and 81.08 % is achieved on the single-label, single- and dual-label, and dual-label proteins, respectively. The prediction models have been implemented on a web server available at http://bioinfo.usu.edu/Plant-mSubP/. The results indicate that the proposed approach is comparable to the existing methods in single localization prediction and outperforms all other existing tools when compared for dual-label proteins. The prediction tool will be a useful resource for better annotation of various plant proteomes.
引用
收藏
页数:10
相关论文
共 53 条
[1]   SCLpredT: Ab initio and homology-based prediction of subcellular localization by N-to-1 neural networks [J].
Adelfio, Alessandro ;
Volpato, Viola ;
Pollastri, Gianluca .
SPRINGERPLUS, 2013, 2 :1-11
[2]  
[Anonymous], 2011, P USENIX ANN TECHN C
[3]   YLoc-an interpretable web server for predicting subcellular localization [J].
Briesemeister, Sebastian ;
Rahnenfuehrer, Joerg ;
Kohlbacher, Oliver .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W497-W502
[4]  
Casadio Rita, 2008, Briefings in Functional Genomics & Proteomics, V7, P63, DOI 10.1093/bfgp/eln003
[5]   Subcellular Localization of a Plant Catalase-Phenol Oxidase, AcCATPO, from Amaranthus and Identification of a Non-canonical Peroxisome Targeting Signal [J].
Chen, Ning ;
Teng, Xiao-Lu ;
Xiao, Xing-Guo .
FRONTIERS IN PLANT SCIENCE, 2017, 8
[6]   iRNA-3typeA: Identifying Three Types of Modification at RNA's Adenosine Sites [J].
Chen, Wei ;
Feng, Pengmian ;
Yang, Hui ;
Ding, Hui ;
Lin, Hao ;
Chou, Kuo-Chen .
MOLECULAR THERAPY-NUCLEIC ACIDS, 2018, 11 :468-474
[7]  
Chou K., 2010, PLOS ONE, V5, P1
[8]   Prediction of protein subcellular locations by incorporating quasi-sequence-order effect [J].
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2000, 278 (02) :477-483
[9]   Predicting protein localization in budding yeast [J].
Chou, KC ;
Cai, YD .
BIOINFORMATICS, 2005, 21 (07) :944-950
[10]   Prediction of protein subcellular locations by GO-FunD-PseAA predictor [J].
Chou, KC ;
Cai, YD .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2004, 320 (04) :1236-1239