PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features

被引:14
|
作者
Nilamyani, Andi Nur [1 ]
Auliah, Firda Nurul [1 ]
Moni, Mohammad Ali [2 ]
Shoombuatong, Watshara [3 ]
Hasan, Md Mehedi [1 ,4 ]
Kurata, Hiroyuki [1 ]
机构
[1] Kyushu Inst Technol, Dept Biosci & Bioinformat, 680-4 Kawazu, Iizuka, Fukuoka 8208502, Japan
[2] UNSW Sydney, Sch Publ Hlth & Community Med, WHO Collaborating Ctr eHlth, UNSW Digital Hlth,Fac Med, Sydney, NSW 2052, Australia
[3] Mahidol Univ, Fac Med Technol, Ctr Data Min & Biomed Informat, Bangkok 10700, Thailand
[4] Japan Soc Promot Sci, Chiyoda Ku, 5-3-1 Kojimachi, Tokyo 1020083, Japan
基金
日本学术振兴会;
关键词
nitrotyrosine; post-translational modification; feature encoding; RFE feature selection; machine learning; COMPUTATIONAL IDENTIFICATION; ANTIINFLAMMATORY PEPTIDES; BIOINFORMATICS TOOLS; 3-NITROTYROSINE; NITROSYLATION;
D O I
10.3390/ijms22052704
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [1] Prediction of 2-hydroxyisobutyrylation sites by integrating multiple sequence features with ensemble support vector machine
    Ju, Zhe
    Wang, Shi-Yun
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2020, 87
  • [2] NTyroSite: Computational Identification of Protein Nitrotyrosine Sites Using Sequence Evolutionary Features
    Hasan, Md. Mehedi
    Khatun, Mst. Shamima
    Mollah, Md. Nurul Haque
    Cao Yong
    Guo Dianjing
    MOLECULES, 2018, 23 (07):
  • [3] PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations
    Auliah, Firda Nurul
    Nilamyani, Andi Nur
    Shoombuatong, Watshara
    Alam, Md Ashad
    Hasan, Md Mehedi
    Kurata, Hiroyuki
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (04) : 1 - 12
  • [4] Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species
    Wei, Leyi
    Luan, Shasha
    Nagai, Luis Augusto Eijy
    Su, Ran
    Zou, Quan
    BIOINFORMATICS, 2019, 35 (08) : 1326 - 1333
  • [5] Prediction of Protein Lysine Acylation by Integrating Primary Sequence Information with Multiple Functional Features
    Du, Yipeng
    Zhai, Zichao
    Li, Ying
    Lu, Ming
    Cai, Tanxi
    Zhou, Bo
    Huang, Lei
    Wei, Taotao
    Li, Tingting
    JOURNAL OF PROTEOME RESEARCH, 2016, 15 (12) : 4234 - 4244
  • [6] M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species
    Qiang, Xiaoli
    Chen, Huangrong
    Ye, Xiucai
    Su, Ran
    Wei, Leyi
    FRONTIERS IN GENETICS, 2018, 9
  • [7] Integrating multiple sequence features for identifying anticancer peptides
    Zou, Hongliang
    Yang, Fan
    Yin, Zhijian
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2022, 99
  • [8] Integrating Pattern Features to Sequence Model for Traffic Index Prediction
    Zhang, Yueying
    Xu, Zhijie
    Zhang, Jianqin
    Wang, Jingjing
    Mao, Lizeng
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 1589 - 1596
  • [9] GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features
    Hasan, Md. Mehedi
    Kurata, Hiroyuki
    PLOS ONE, 2018, 13 (10):
  • [10] Integrating protein secondary structure prediction and multiple sequence alignment
    Simossis, VA
    Heringa, J
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2004, 5 (04) : 249 - 266