A sequence-based computational method for prediction of MoRFs

被引:6
|
作者
Wang, Yu [1 ]
Guo, Yanzhi [1 ]
Pu, Xuemei [1 ]
Li, Menglong [1 ]
机构
[1] Sichuan Univ, Coll Chem, Chengdu 610064, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
MOLECULAR RECOGNITION FEATURES; INTRINSICALLY DISORDERED PROTEINS; SECONDARY STRUCTURE; WEB SERVER; BINDING; REGIONS; KNN;
D O I
10.1039/c6ra27161h
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Molecular recognition features (MoRFs) are relatively short segments (10-70 residues) within intrinsically disordered regions (IDRs) that can undergo disorder-to-order transitions during binding to partner proteins. Since MoRFs play key roles in important biological processes such as signaling and regulation, identifying them is crucial for a full understanding of the functional aspects of the IDRs. However, given the relative sparseness of MoRFs in protein sequences, the accuracy of the available MoRF predictors is often inadequate for practical usage, which leaves a significant need and room for improvement. In this work, we developed a novel sequence-based predictor for MoRFs using a support vector machine (SVM) algorithm. First, we constructed a comprehensive dataset of annotated MoRFs with the wide length between 10 and 70 residues. Our method firstly utilized the flanking regions to define the negative samples. Then, amino acid composition (AAC) and two previously unexplored features including composition, transition and distribution (CTD) and K nearest neighbors (KNN) score were used to characterize sequence information of MoRFs. Finally, using five-fold cross-validation, an overall accuracy of 75.75% was achieved through feature evaluation and optimization. When performed on an independent test set of 110 proteins, the method also yielded a promising accuracy of 64.98%. Additionally, through external validation on the negative samples, our method still shows comparative performance with other existing methods. We believe that this study will be useful in elucidating the mechanism of MoRFs and facilitating hypothesis-driven experimental design and validation.
引用
收藏
页码:18937 / 18945
页数:9
相关论文
共 50 条
  • [21] Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins
    Raimondi, Daniele
    Orlando, Gabriele
    Pancsa, Rita
    Khan, Taushif
    Vranken, Wim F.
    SCIENTIFIC REPORTS, 2017, 7
  • [22] CRYSpred: Accurate Sequence-Based Protein Crystallization Propensity Prediction Using Sequence-Derived Structural Characteristics
    Mizianty, Marcin J.
    Kurgan, Lukasz A.
    PROTEIN AND PEPTIDE LETTERS, 2012, 19 (01) : 40 - 49
  • [23] DeepCLD: An Efficient Sequence-Based Predictor of Intrinsically Disordered Proteins
    Fang, Min
    He, Yufeng
    Du, Zhihua
    Uversky, Vladimir N.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (06) : 3154 - 3159
  • [24] Improving Sequence-Based Prediction of Protein Peptide Binding Residues by Introducing Intrinsic Disorder and a Consensus Method
    Zhao, Zijuan
    Peng, Zhenling
    Yang, Jianyi
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (07) : 1459 - 1468
  • [25] Sequence-based prediction of permissive stretches for internal protein tagging and knockdown
    Oesterle, Sabine
    Roberts, Tania Michelle
    Widmer, Lukas Andreas
    Mustafa, Harun
    Panke, Sven
    Billerbeck, Sonja
    BMC BIOLOGY, 2017, 15
  • [26] SeRenDIP-CE: sequence-based interface prediction for conformational epitopes
    Hou, Qingzhen
    Stringer, Bas
    Waury, Katharina
    Capel, Henriette
    Haydarlou, Reza
    Xue, Fuzhong
    Abeln, Sanne
    Heringa, Jaap
    Feenstra, K. Anton
    BIOINFORMATICS, 2021, 37 (20) : 3421 - 3427
  • [27] Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines
    Taherzadeh, Ghazaleh
    Zhou, Yaoqi
    Liew, Alan Wee-Chung
    Yang, Yuedong
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2016, 56 (10) : 2115 - 2122
  • [28] CRYSTALP2: sequence-based protein crystallization propensity prediction
    Kurgan, Lukasz
    Razib, Ali A.
    Aghakhani, Sara
    Dick, Scott
    Mizianty, Marcin
    Jahandideh, Samad
    BMC STRUCTURAL BIOLOGY, 2009, 9
  • [29] Antibody sequence-based prediction of pH gradient elution in multimodal chromatography
    Hess, Rudger
    Faessler, Jan
    Yun, Doil
    Saleh, David
    Grosch, Jan-Hendrik
    Schwab, Thomas
    Hubbuch, Juergen
    JOURNAL OF CHROMATOGRAPHY A, 2023, 1711
  • [30] metaPIS: A Sequence-based Meta-server for Protein Interaction Site Prediction
    Huang, Junfeng
    Deng, Riqiang
    Wang, Jinwen
    Wu, Hongkai
    Xiong, Yuanyan
    Wang, Xunzhang
    PROTEIN AND PEPTIDE LETTERS, 2013, 20 (02) : 218 - 230