A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach

被引:5
|
作者
Cai, Yudong [1 ,2 ]
He, ZhiSong [3 ]
Shi, Xiaohe [4 ,5 ]
Kong, Xiangying [4 ,5 ,6 ]
Gu, Lei [7 ]
Xie, Lu [8 ]
机构
[1] Shanghai Univ, Inst Syst Biol, Shanghai 200244, Peoples R China
[2] Fudan Univ, Ctr Computat Syst Biol, Shanghai 200433, Peoples R China
[3] Zhejiang Univ, Dept Bioinformat, Coll Life Sci, Hangzhou 310058, Zhejiang, Peoples R China
[4] Chinese Acad Sci, Shanghai Inst Biol Sci, Inst Hlth Sci, Beijing 100864, Peoples R China
[5] Shanghai Jiao Tong Univ, Sch Med, Shanghai, Peoples R China
[6] Shanghai Jiao Tong Univ, Ruijin Hosp, State Key Lab Med Genom, Shanghai 200025, Peoples R China
[7] Fraunhofer Inst Algorithms & Sci Comp, Dept Bioinformat, Aachen, Germany
[8] Shanghai Ctr Bioinformat Technol, Shanghai 200235, Peoples R China
关键词
bioinformatics; data mining; machine learning; mRMR; protein-DNA interaction; SITES; INFORMATION; IDENTIFICATION; RECOGNITION; MODELS; MOTIFS; DOMAIN; P53;
D O I
10.1007/s10059-010-0093-0
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein-DNA interactions play an essential role in transcriptional regulation, DNA repair, and many vital biological processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.
引用
收藏
页码:99 / 105
页数:7
相关论文
共 50 条
  • [41] SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners
    Liu, Xinyi
    Liu, Bin
    Huang, Zhimin
    Shi, Ting
    Chen, Yingyi
    Zhang, Jian
    PLOS ONE, 2012, 7 (01):
  • [42] A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues
    Yan, Jing
    Friedrich, Stefanie
    Kurgan, Lukasz
    BRIEFINGS IN BIOINFORMATICS, 2016, 17 (01) : 88 - 105
  • [43] A Novel Approach to Predict Core Residues on Cancer-Related DNA-Binding Domains
    Wong, Ka-Chun
    CANCER INFORMATICS, 2016, 15 : 1 - 7
  • [44] Sequence-Based Prediction of Plant Allergenic Proteins: Machine Learning Classification Approach
    Nedyalkova, Miroslava
    Vasighi, Mahdi
    Azmoon, Amirreza
    Naneva, Ludmila
    Simeonov, Vasil
    ACS OMEGA, 2023, : 3698 - 3704
  • [45] A novel approach for predicting DNA splice junctions using hybrid machine learning algorithms
    Indrajit Mandal
    Soft Computing, 2015, 19 : 3431 - 3444
  • [46] A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features
    Zhao, Tong-Hui
    Jiang, Min
    Huang, Tao
    Li, Bi-Qing
    Zhang, Ning
    Li, Hai-Peng
    Cai, Yu-Dong
    BIOMED RESEARCH INTERNATIONAL, 2013, 2013
  • [47] Predicting protein-RNA interaction using sequence derived features and machine learning approach
    Pandey, Chandan
    Sandeep, Rokkam
    Priyam, Aikansh
    Mahapatra, Satyajit
    Sahu, Sitanshu Sekhar
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2018, 19 (03) : 270 - 282
  • [48] GPpred: A Novel Sequence-Based Tool for Predicting Glutamic Proteases Using Optimized Hybrid Encodings
    Firoz, Ahmad
    Malik, Adeel
    Mahajan, Nitin
    Ali, Hani Mohammed
    Kamli, Majid Rasool
    Kim, Chang-Bae
    CATALYSTS, 2024, 14 (12)
  • [49] LGC-DBP: the method of DNA-binding protein identification based on PSSM and deep learning
    Zhu, Yiqi
    Sun, Ailun
    FRONTIERS IN GENETICS, 2024, 15
  • [50] CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
    Ali Haisam Muhammad Rafid
    Md. Toufikuzzaman
    Mohammad Saifur Rahman
    M. Sohel Rahman
    BMC Bioinformatics, 21