A sequence-based computational method for prediction of MoRFs

被引:6
|
作者
Wang, Yu [1 ]
Guo, Yanzhi [1 ]
Pu, Xuemei [1 ]
Li, Menglong [1 ]
机构
[1] Sichuan Univ, Coll Chem, Chengdu 610064, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
MOLECULAR RECOGNITION FEATURES; INTRINSICALLY DISORDERED PROTEINS; SECONDARY STRUCTURE; WEB SERVER; BINDING; REGIONS; KNN;
D O I
10.1039/c6ra27161h
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Molecular recognition features (MoRFs) are relatively short segments (10-70 residues) within intrinsically disordered regions (IDRs) that can undergo disorder-to-order transitions during binding to partner proteins. Since MoRFs play key roles in important biological processes such as signaling and regulation, identifying them is crucial for a full understanding of the functional aspects of the IDRs. However, given the relative sparseness of MoRFs in protein sequences, the accuracy of the available MoRF predictors is often inadequate for practical usage, which leaves a significant need and room for improvement. In this work, we developed a novel sequence-based predictor for MoRFs using a support vector machine (SVM) algorithm. First, we constructed a comprehensive dataset of annotated MoRFs with the wide length between 10 and 70 residues. Our method firstly utilized the flanking regions to define the negative samples. Then, amino acid composition (AAC) and two previously unexplored features including composition, transition and distribution (CTD) and K nearest neighbors (KNN) score were used to characterize sequence information of MoRFs. Finally, using five-fold cross-validation, an overall accuracy of 75.75% was achieved through feature evaluation and optimization. When performed on an independent test set of 110 proteins, the method also yielded a promising accuracy of 64.98%. Additionally, through external validation on the negative samples, our method still shows comparative performance with other existing methods. We believe that this study will be useful in elucidating the mechanism of MoRFs and facilitating hypothesis-driven experimental design and validation.
引用
收藏
页码:18937 / 18945
页数:9
相关论文
共 50 条
  • [41] IsUnstruct: prediction of the residue status to be ordered or disordered in the protein chain by a method based on the Ising model
    Lobanov, Michail Yu
    Sokolovskiy, Igor V.
    Galzitskaya, Oxana V.
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2013, 31 (10) : 1034 - 1043
  • [42] Sequence-based prediction of the intrinsic solubility of peptides containing non-natural amino acids
    Oeller, Marc
    Kang, Ryan J. D.
    Bolt, Hannah L.
    dos Santos, Ana L. Gomes
    Weinmann, Annika Langborg
    Nikitidis, Antonios
    Zlatoidsky, Pavol
    Su, Wu
    Czechtizky, Werngard
    De Maria, Leonardo
    Sormanni, Pietro
    Vendruscolo, Michele
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [43] Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features
    Sun, Ming-an
    Zhang, Qing
    Wang, Yejun
    Ge, Wei
    Guo, Dianjing
    BMC BIOINFORMATICS, 2016, 17
  • [44] Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins
    Zhang, Jian
    Ghadermarzi, Sina
    Kurgan, Lukasz
    BIOINFORMATICS, 2020, 36 (18) : 4729 - 4738
  • [45] Sequence-Based Prediction of Protein-Protein Interactions by Means of Rotation Forest and Autocorrelation Descriptor
    Xia, Jun-Feng
    Han, Kyungsook
    Huang, De-Shuang
    PROTEIN AND PEPTIDE LETTERS, 2010, 17 (01) : 137 - 145
  • [46] Sequence-based recognition of protein folds using the threading method and frameworks of globular proteins
    Rykunov, DS
    Lobanov, MY
    Finkelstein, AV
    MOLECULAR BIOLOGY, 1998, 32 (03) : 428 - 438
  • [47] SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners
    Liu, Xinyi
    Liu, Bin
    Huang, Zhimin
    Shi, Ting
    Chen, Yingyi
    Zhang, Jian
    PLOS ONE, 2012, 7 (01):
  • [48] Sequence-based protein structure prediction using a reduced state-space hidden Markov model
    Lampros, Christos
    Papaloukas, Costas
    Exarchos, Themis P.
    Goletsis, Yorgos
    Fotiadis, Dimitrios I.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2007, 37 (09) : 1211 - 1224
  • [49] Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins
    Heffernan, Rhys
    Dehzangi, Abdollah
    Lyons, James
    Paliwal, Kuldip
    Sharma, Alok
    Wang, Jihua
    Sattar, Abdul
    Zhou, Yaoqi
    Yang, Yuedong
    BIOINFORMATICS, 2016, 32 (06) : 843 - 849
  • [50] iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides
    Charoenkwan, Phasit
    Yana, Janchai
    Nantasenamat, Chanin
    Hasan, Mehedi
    Shoombuatong, Watshara
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2020, 60 (12) : 6666 - 6678