Prediction of protein N-formylation and comparison with N-acetylation based on a feature selection method

被引:12
作者
Zhou, You [1 ,2 ,3 ]
Huang, Tao [2 ,3 ]
Huang, Guohua [1 ]
Zhang, Ning [4 ]
Kong, XiangYin [2 ,3 ]
Cai, Yu-Dong [1 ]
机构
[1] Shanghai Univ, Sch Life Sci, Shanghai, Peoples R China
[2] Chinese Acad Sci, Inst Hlth Sci, Shanghai Inst Biol Sci, Shanghai, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Med, Shanghai, Peoples R China
[4] Tianjin Univ, Dept Biomed Engn, Tianjin Key Lab Biomed Engn Measurement, Tianjin, Peoples R China
基金
中国国家自然科学基金; 新加坡国家研究基金会;
关键词
N-formylation; N-acetylation; Post-translational modification; Random forest; Incremental feature selection; LINKER HISTONE H1; LYSINE ACETYLATION; POSTTRANSLATIONAL MODIFICATIONS; INTRINSIC DISORDER; SITES; METHYLATION; SEQUENCES; PHOSPHORYLATION; IDENTIFICATION; DATABASE;
D O I
10.1016/j.neucom.2015.10.148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Post-translational modifications play important roles in cell activities ranging from gene regulation to cytoplasmic mechanisms. Unfortunately, experimental methods investigating protein post-translational modifications such as high-resolution mass spectrometry are time consuming, labor-intensive and expensive. Therefore, there is a need to develop computational methods to facilitate fast and efficient identification. In this study, we developed a method to predict N-formylated methionines based on the Dagging method. Various features were incorporated, including PSSM conservation scores, amino acid factors, secondary structures, solvent accessibilities and disorder scores. An optimal feature set was selected containing 28 features using the mRMR (Maximum Relevance Minimum Redundancy) method and the IFS (Incremental Feature Selection) method. The prediction model constructed based on these features achieved an accuracy of 0.9074 and a MCC value of 0.7478. Analysis of these optimal features was performed, and several important factors and important sites were revealed to play important roles in N-formylation formation. We also compared N-formylation with N-acetylation, another type of important N-terminal modification of methionines. A total of top 34 MaxRel (most relevant) features were selected to discriminate between the two types of modifications, which may be candidates for studying the different mechanisms between N-formylation and N-acetylation. The results from our study further the understanding of these two types of modifications and provide guidance for related validation experiments. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:53 / 62
页数:10
相关论文
共 57 条
  • [31] Prediction of Protein Domain with mRMR Feature Selection and Analysis
    Li, Bi-Qing
    Hu, Le-Le
    Chen, Lei
    Feng, Kai-Yan
    Cai, Yu-Dong
    Chou, Kuo-Chen
    [J]. PLOS ONE, 2012, 7 (06):
  • [32] Predict and analyze S-nitrosylation modification sites with the mRMR and IFS approaches
    Li, Bi-Qing
    Hu, Le-Le
    Niu, Shen
    Cai, Yu-Dong
    Chou, Kuo-Chen
    [J]. JOURNAL OF PROTEOMICS, 2012, 75 (05) : 1654 - 1665
  • [33] Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
    Li, Weizhong
    Godzik, Adam
    [J]. BIOINFORMATICS, 2006, 22 (13) : 1658 - 1659
  • [34] LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy
    Lin, Chen
    Chen, Wenqiang
    Qiu, Cheng
    Wu, Yunfeng
    Krishnan, Sridhar
    Zou, Quan
    [J]. NEUROCOMPUTING, 2014, 123 : 424 - 435
  • [35] Bioinformatic Analysis and Post-Translational Modification Crosstalk Prediction of Lysine Acetylation
    Lu, Zhike
    Cheng, Zhongyi
    Zhao, Yingming
    Volchenboum, Samuel L.
    [J]. PLOS ONE, 2011, 6 (12):
  • [36] Role of N-terminal protein formylation in central metabolic processes in Staphylococcus aureus
    Mader, Diana
    Liebeke, Manuel
    Winstel, Volker
    Methling, Karen
    Leibig, Martina
    Goetz, Friedrich
    Lalk, Michael
    Peschel, Andreas
    [J]. BMC MICROBIOLOGY, 2013, 13
  • [37] The diverse functions of histone lysine methylation
    Martin, C
    Zhang, Y
    [J]. NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2005, 6 (11) : 838 - 849
  • [38] Assessment of disorder predictions in CASP8
    Noivirt-Brik, Orly
    Prilusky, Jaime
    Sussman, Joel L.
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2009, 77 : 210 - 216
  • [39] Surface accessibility of protein post-translational modifications
    Pang, Chi Nam Ignatius
    Hayen, Andrew
    Wilkins, Marc Ronald
    [J]. JOURNAL OF PROTEOME RESEARCH, 2007, 6 (05) : 1833 - 1845
  • [40] Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy
    Peng, HC
    Long, FH
    Ding, C
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (08) : 1226 - 1238