Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model

被引:20
|
作者
Zhang, Lu [1 ]
Liu, Min [1 ]
Qin, Xinyi [1 ]
Liu, Guangzhong [1 ]
机构
[1] Shanghai Maritime Univ, Coll Informat Engn, 1550 Haigang Ave, Shanghai 201306, Peoples R China
基金
上海市自然科学基金;
关键词
LYSINE SUCCINYLATION; POSTTRANSLATIONAL MODIFICATION; UBIQUITINATION SITES; IDENTIFICATION; EXPRESSION; PATTERNS; SIRT5; TOOL;
D O I
10.1155/2020/8858489
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Succinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related to the occurrence of many diseases. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. In this study, we develop a new model, IFS-LightGBM (BO), which utilizes the incremental feature selection (IFS) method, the LightGBM feature selection method, the Bayesian optimization algorithm, and the LightGBM classifier, to predict succinylation sites in proteins. Specifically, pseudo amino acid composition (PseAAC), position-specific scoring matrix (PSSM), disorder status, and Composition of k-spaced Amino Acid Pairs (CKSAAP) are firstly employed to extract feature information. Then, utilizing the combination of the LightGBM feature selection method and the incremental feature selection (IFS) method selects the optimal feature subset for the LightGBM classifier. Finally, to increase prediction accuracy and reduce the computation load, the Bayesian optimization algorithm is used to optimize the parameters of the LightGBM classifier. The results reveal that the IFS-LightGBM (BO)-based prediction model performs better when it is evaluated by some common metrics, such as accuracy, recall, precision, Matthews Correlation Coefficient (MCC), and F-measure.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Prediction Model of Thermophilic Protein Based on Stacking Method
    Wang, Xian-Fang
    Lu, Fan
    Du, Zhi-Yong
    Li, Qi-Meng
    CURRENT BIOINFORMATICS, 2021, 16 (10) : 1328 - 1340
  • [22] In-silico target prediction by ensemble chemogenomic model based on multi-scale information of chemical structures and protein sequences
    Yang, Su-Qing
    Zhang, Liu-Xia
    Ge, You-Jin
    Zhang, Jin-Wei
    Hu, Jian-Xin
    Shen, Cheng-Ying
    Lu, Ai-Ping
    Hou, Ting-Jun
    Cao, Dong-Sheng
    JOURNAL OF CHEMINFORMATICS, 2023, 15 (01)
  • [23] Attenphos: General Phosphorylation Site Prediction Model Based on Attention Mechanism
    Song, Tao
    Yang, Qing
    Qu, Peng
    Qiao, Lian
    Wang, Xun
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (03)
  • [24] Prediction of recombinant protein overexpression in Escherichia coli using a machine learning based model (RPOLP)
    Habibi, Narjeskhatoon
    Norouzi, Alireza
    Hashim, Siti Z. Mohd
    Shamsir, Mohd Shahir
    Samian, Razip
    COMPUTERS IN BIOLOGY AND MEDICINE, 2015, 66 : 330 - 336
  • [25] Two-Level Protein Methylation Prediction using structure model-based features
    Zheng, Wei
    Wuyun, Qiqige
    Cheng, Micah
    Hu, Gang
    Zhang, Yanping
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [26] PredHydroxy: computational prediction of protein hydroxylation site locations based on the primary structure
    Shi, Shao-Ping
    Chen, Xiang
    Xu, Hao-Dong
    Qiu, Jian-Ding
    MOLECULAR BIOSYSTEMS, 2015, 11 (03) : 819 - 825
  • [27] Prediction of antibiotic resistance mechanisms using a protein language model
    Yagimoto, Kanami
    Hosoda, Shion
    Sato, Miwa
    Hamada, Michiaki
    BIOINFORMATICS, 2024, 40 (10)
  • [28] DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers
    Soylu, Necla Nisa
    Sefer, Emre
    CURRENT BIOINFORMATICS, 2024, 19 (09) : 810 - 824
  • [29] Using WPNNA Classifier in Ubiquitination Site Prediction Based on Hybrid Features
    Feng, Kai-Yan
    Huang, Tao
    Feng, Kai-Rui
    Liu, Xiao-Jun
    PROTEIN AND PEPTIDE LETTERS, 2013, 20 (03) : 318 - 323
  • [30] DisoFLAG: accurate prediction of protein intrinsic disorder and its functions using graph-based interaction protein language model
    Pang, Yihe
    Liu, Bin
    BMC BIOLOGY, 2024, 22 (01)