Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana

被引:29
作者
Mosharaf, Md Parvez [1 ]
Hassan, Md Mehedi [2 ]
Ahmed, Fee Faysal [1 ,3 ]
Khatun, Shamima [2 ]
Moni, Mohammad Ali [4 ]
Mollah, Md Nurul Haque [1 ]
机构
[1] Rajshahi Univ, Dept Stat, Bioinformat Lab, Rajshahi 6205, Bangladesh
[2] Kyushu Inst Technol, Dept Biosci & Bioinformat, 680-4 Kawazu, Iizuka, Fukuoka 8208502, Japan
[3] Jashore Univ Sci & Technol, Dept Math, Jashore, Bangladesh
[4] Univ Sydney, Sydney Med Sch, Sch Med Sci, Discipline Biomed Sci, Sydney, NSW, Australia
关键词
Arabidopsis thaliana species; Protein sequences; Ubiquitination sites; CKSAAP encoding; Random forest; LYSINE UBIQUITINATION; SEQUENCE; IDENTIFICATION;
D O I
10.1016/j.compbiolchem.2020.107238
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Among the protein post-translational modifications (PTMs), ubiquitination is considered as one of the most significant processes which can regulate the cellular functions and various diseases. Identification of ubiquitination sites becomes important for understanding the mechanisms of ubiquitination-related biological processes. Both experimental and computational approaches are available for identifying ubiquitination sites based on protein sequences of different species. The experimental approaches are time-consuming, laborious and costly. In silico prediction is an alternative time saving, easier and cost-effective approach for identifying ubiquitination sites. Moreover, the sequence patterns in the different species around the ubiquitination sites are not similar which demands species-specific predictors. Therefore, in this study, we have proposed a novel computational method for identifying ubiquitination sites based on protein sequences of A. thaliana species which will be robust against outlying observations also. Through the comparative study of two encoding schemes and three classifiers, the random forest (RF) based predictor was selected as the best predictor under the CKSAAP encoding scheme with 1:1 ratio of positive and negative samples (i.e. ubiquitinated and non-ubiquitinated) in training dataset. The proposed predictor produced the area under the ROC curve (AUC score) as 0.91 and 0.86 for 5-fold cross-validation test with the training dataset and the independent test dataset of A. thaliana respectively. The proposed RF based predictor also performed much better than the other existing ubiquitination sites predictors for A. thaliana.
引用
收藏
页数:7
相关论文
共 39 条
[1]   Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism [J].
Ahmad, Shandar ;
Prathipati, Philip ;
Tripathi, Lokesh P. ;
Chen, Yi-An ;
Arya, Ajay ;
Murakami, Yoichi ;
Mizuguchi, Kenji .
NUCLEIC ACIDS RESEARCH, 2018, 46 (01) :54-70
[2]  
[Anonymous], BRIEF BIOINFORM
[3]  
Breiman L., 2001, IEEE Trans. Broadcast., V45, P5
[4]   Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences [J].
Cai, Binghuang ;
Jiang, Xia .
BMC BIOINFORMATICS, 2016, 17
[5]   Prediction of lysine ubiquitination with mRMR feature selection and analysis [J].
Cai, Yudong ;
Huang, Tao ;
Hu, Lele ;
Shi, Xiaohe ;
Xie, Lu ;
Li, Yixue .
AMINO ACIDS, 2012, 42 (04) :1387-1395
[6]   Prediction of Protein Ubiquitination Sites in Arabidopsis thaliana [J].
Chen, Jiajing ;
Zhao, Jianan ;
Yang, Shiping ;
Chen, Zhen ;
Zhang, Ziding .
CURRENT BIOINFORMATICS, 2019, 14 (07) :614-620
[7]   Large-scale comparative assessment of computational predictors for lysine post-translational modification sites [J].
Chen, Zhen ;
Liu, Xuhan ;
Li, Fuyi ;
Li, Chen ;
Marquez-Lago, Tatiana ;
Leier, Andre ;
Akutsu, Tatsuya ;
Webb, Geoffrey, I ;
Xu, Dakang ;
Smith, Alexander Ian ;
Li, Lei ;
Chou, Kuo-Chen ;
Song, Jiangning .
BRIEFINGS IN BIOINFORMATICS, 2019, 20 (06) :2267-2290
[8]   hCKSAAP_UbSite: Improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties [J].
Chen, Zhen ;
Zhou, Yuan ;
Song, Jiangning ;
Zhang, Ziding .
BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2013, 1834 (08) :1461-1467
[9]   Prediction of S-nitrosylation sites by integrating support vector machines and random forest [J].
Hasan, Md Mehedi ;
Manavalan, Balachandran ;
Khatun, Shamima ;
Kurata, Hiroyuki .
MOLECULAR OMICS, 2019, 15 (06) :451-458
[10]   Computational identification of microbial phosphorylation sites by the enhanced characteristics of sequence information [J].
Hasan, Md Mehedi ;
Rashid, Md Mamunur ;
Khatun, Mst Shamima ;
Kurata, Hiroyuki .
SCIENTIFIC REPORTS, 2019, 9 (1)