Protein sumoylation sites prediction based on two-stage feature selection

被引:27
|
作者
Lu, Lin [3 ]
Shi, Xiao-He [5 ]
Li, Su-Jun [7 ]
Xie, Zhi-Qun [1 ]
Feng, Yong-Li [6 ]
Lu, Wen-Cong [6 ]
Li, Yi-Xue [4 ,7 ]
Li, Haipeng [1 ]
Cai, Yu-Dong [1 ,2 ]
机构
[1] Chinese Acad Sci, Shanghai Inst Biol Sci, MPG Partner Inst Computat Biol, Shanghai 200031, Peoples R China
[2] Shanghai Univ, Inst Syst Biol, Shanghai 200244, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Biomed Engn, Shanghai 200240, Peoples R China
[4] Sch Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China
[5] Chinese Acad Sci, Shanghai Inst Biol Sci, Inst Hlth Sci, Shanghai 200025, Peoples R China
[6] Coll Sci, Dept Chem, Shanghai 200444, Peoples R China
[7] Chinese Acad Sci, Shanghai Inst Biol Sci, Key Lab Syst Biol, Shanghai 200031, Peoples R China
关键词
Prediction; Protein sumoylation; mRMR; AAIndex; Nearest Neighbor Algorithm; Leave-one-out cross-validation; Bioinformatics; ACID INDEX DATABASE; SUMO; CONJUGATION; AAINDEX; UBC9;
D O I
10.1007/s11030-009-9149-5
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein sumoylation is one of the most important post-translational modifications. Accurate prediction of sumoylation sites is very useful for the analysis of proteome. Though the putative motif IK XE can be used, optimization of prediction models still remains a challenge. In this study, we developed a prediction system based on feature selection strategy. A total of 1,272 peptides with 14 residues from SUMOsp (Xue et al. [8] Nucleic Acids Res 34:W254-W257, 2006) were investigated in this study, including 212 substrates and 1,060 non-substrates. Among the substrates, only 162 substrates comply to the motif IK XE. First, 1,272 substrates were divided into training set and test set. All the substrates were encoded into feature vectors by hundreds of amino acid properties collected by Amino Acid Index Database (AAIndex, http://www.genome.jp/aaindex ). Then, mRMR (minimum redundancy-maximum relevance) method was applied to extract the most informative features. Finally, Nearest Neighbor Algorithm (NNA) was used to produce the prediction models. Tested by Leave-one-out (LOO) cross-validation, the optimal prediction model reaches the accuracy of 84.4% for the training set and 76.4% for the test set. Especially, 180 substrates were correctly predicted, which was 18 more than using the motif IK XE. The final selected features indicate that amino acid residues with two-residue downstream and one-residue upstream of the sumoylation sites play the most important role in determining the occurrence of sumoylation. Based on the feature selection strategy, our prediction system can not only be used for high throughput prediction of sumoylation sites but also as a tool to investigate the mechanism of sumoylation.
引用
收藏
页码:81 / 86
页数:6
相关论文
共 50 条
  • [41] A daily carbon emission prediction model combining two-stage feature selection and optimized extreme learning machine
    Feng Kong
    Jianbo Song
    Zhongzhi Yang
    Environmental Science and Pollution Research, 2022, 29 : 87983 - 87997
  • [42] A daily carbon emission prediction model combining two-stage feature selection and optimized extreme learning machine
    Kong, Feng
    Song, Jianbo
    Yang, Zhongzhi
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2022, 29 (58) : 87983 - 87997
  • [43] Robust Prediction of B-Factor Profile from Sequence Using Two-Stage SVR Based on Random Forest Feature Selection
    Pan, Xiao-Yong
    Shen, Hong-Bin
    PROTEIN AND PEPTIDE LETTERS, 2009, 16 (12): : 1447 - 1454
  • [44] Fast Prediction of Protein Methylation Sites Using a Sequence-Based Feature Selection Technique
    Wei, Leyi
    Xing, Pengwei
    Shi, Gaotao
    Ji, Zhiliang
    Zou, Quan
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (04) : 1264 - 1273
  • [45] A two-stage clonal selection algorithm for local feature selection on high-dimensional data
    Wang, Yi
    Tian, Hao
    Li, Tao
    Liu, Xiaojie
    INFORMATION SCIENCES, 2024, 677
  • [46] Two-stage Unsupervised Feature Selection Method Oriented to Manufacturing Procedural Data
    Zhang J.
    Sheng X.
    Zhang P.
    Qin W.
    Zhao X.
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2019, 55 (17): : 133 - 144
  • [47] A Two-stage Prediction-based Beam Selection Algorithm in MmWave Massive MIMO Systems
    Sheng, Yuxiang
    Xu, Jin
    Tao, Xiaofeng
    2023 IEEE 34TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, PIMRC, 2023,
  • [48] A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes
    Pookhao, Naruekamol
    Sohn, Michael B.
    Li, Qike
    Jenkins, Isaac
    Du, Ruofei
    Jiang, Hongmei
    An, Lingling
    BIOINFORMATICS, 2015, 31 (02) : 158 - 165
  • [49] Learning for Efficient Supervised Query Expansion via Two-stage Feature Selection
    Zhang, Zhiwei
    Wang, Qifan
    Si, Luo
    Gao, Jianfeng
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 265 - 274
  • [50] Two-stage hemoglobin prediction based on prior causality
    Chen, Yuwen
    Zhong, Kunhua
    Zhu, Yiziting
    Sun, Qilong
    FRONTIERS IN PUBLIC HEALTH, 2022, 10