Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model

被引:20
|
作者
Zhang, Lu [1 ]
Liu, Min [1 ]
Qin, Xinyi [1 ]
Liu, Guangzhong [1 ]
机构
[1] Shanghai Maritime Univ, Coll Informat Engn, 1550 Haigang Ave, Shanghai 201306, Peoples R China
基金
上海市自然科学基金;
关键词
LYSINE SUCCINYLATION; POSTTRANSLATIONAL MODIFICATION; UBIQUITINATION SITES; IDENTIFICATION; EXPRESSION; PATTERNS; SIRT5; TOOL;
D O I
10.1155/2020/8858489
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Succinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related to the occurrence of many diseases. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. In this study, we develop a new model, IFS-LightGBM (BO), which utilizes the incremental feature selection (IFS) method, the LightGBM feature selection method, the Bayesian optimization algorithm, and the LightGBM classifier, to predict succinylation sites in proteins. Specifically, pseudo amino acid composition (PseAAC), position-specific scoring matrix (PSSM), disorder status, and Composition of k-spaced Amino Acid Pairs (CKSAAP) are firstly employed to extract feature information. Then, utilizing the combination of the LightGBM feature selection method and the incremental feature selection (IFS) method selects the optimal feature subset for the LightGBM classifier. Finally, to increase prediction accuracy and reduce the computation load, the Bayesian optimization algorithm is used to optimize the parameters of the LightGBM classifier. The results reveal that the IFS-LightGBM (BO)-based prediction model performs better when it is evaluated by some common metrics, such as accuracy, recall, precision, Matthews Correlation Coefficient (MCC), and F-measure.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] StackDPP: a stacking ensemble based DNA-binding protein prediction model
    Ahmed, Sheikh Hasib
    Bose, Dibyendu Brinto
    Khandoker, Rafi
    Rahman, M. Saifur
    BMC BIOINFORMATICS, 2024, 25 (01)
  • [42] Residue-Frustration-Based Prediction of Protein-Protein Interactions Using Machine Learning
    Zhou, Xiaozhou
    Song, Haoyu
    Li, Jingyuan
    JOURNAL OF PHYSICAL CHEMISTRY B, 2022, 126 (08) : 1719 - 1727
  • [43] Protein-specific prediction of mRNA binding using RNA sequences, binding motifs and predicted secondary structures
    Livi, Carmen M.
    Blanzieri, Enrico
    BMC BIOINFORMATICS, 2014, 15
  • [44] Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences
    An, Ji-Yong
    Meng, Fan-Rong
    You, Zhu-Hong
    Fang, Yu-Hong
    Zhao, Yu-Jun
    Zhang, Ming
    BIOMED RESEARCH INTERNATIONAL, 2016, 2016
  • [45] Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model
    An, Ji-Yong
    Meng, Fan-Rong
    You, Zhu-Hong
    Chen, Xing
    Yan, Gui-Ying
    Hu, Ji-Pu
    PROTEIN SCIENCE, 2016, 25 (10) : 1825 - 1833
  • [46] A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction
    Chen, Peng
    Hu, ShanShan
    Zhang, Jun
    Gao, Xin
    Li, Jinyan
    Xia, Junfeng
    Wang, Bing
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) : 901 - 912
  • [47] Analogy-based protein structure prediction: II. Testing of substitution matrices and pseudopotentials used to align protein sequences with spatial structures
    Lobanov, M. Yu.
    Finkel'shtein, A. V.
    MOLECULAR BIOLOGY, 2009, 43 (04) : 677 - 684
  • [48] Large-scale prediction of human kinase-inhibitor interactions using protein sequences and molecular topological structures
    Cao, Dong-Sheng
    Zhou, Guang-Hua
    Liu, Shao
    Zhang, Liu-Xia
    Xu, Qing-Song
    He, Min
    Liang, Yi-Zeng
    ANALYTICA CHIMICA ACTA, 2013, 792 : 10 - 18
  • [49] Protein subcellular localization prediction of eukaryotes using a knowledge-based approach
    Lin, Hsin-Nan
    Chen, Ching-Tai
    Sung, Ting-Yi
    Ho, Shinn-Ying
    Hsu, Wen-Lian
    BMC BIOINFORMATICS, 2009, 10
  • [50] Prediction of hot spots in protein interfaces using a random forest model with hybrid features
    Wang, Lin
    Liu, Zhi-Ping
    Zhang, Xiang-Sun
    Chen, Luonan
    PROTEIN ENGINEERING DESIGN & SELECTION, 2012, 25 (03) : 119 - 126