Improved Prediction of DNA-Binding Proteins Using Chaos Game Representation and Random Forest

被引:2
作者
Niu, Xiaohui [1 ]
Hu, Xuehai [1 ]
机构
[1] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Peoples R China
基金
中国国家自然科学基金;
关键词
DNA-binding proteins; chaos game representation; fractal dimension; random forest; AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINES; RIBOSOMAL-RNA-BINDING; STRUCTURAL MOTIFS; FRACTAL DIMENSION; IDENTIFICATION; SEQUENCE; SITES;
D O I
10.2174/1574893611666160223213853
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA-binding proteins (DNA-BPs) play an important role in many biological processes. Now next-generation sequencing technologies are widely used to obtain genome of many organisms. Consequently, identification of DNA-BPs accurately and rapidly will provide significant helps in annotation of genomes. Chaos game representation (CGR) can reveal the information hidden in protein sequences. Furthermore, fractal dimensions are a vital index to measure compactness of complex and irregular geometric objects. In this research, in order to extract the intrinsic correlation with DNA-binding property from protein sequence, CGR algorithm and fractal dimension, together with amino acid composition are applied to formulate the protein samples. Here we employ the random forest as the classifier to predict DNA-BPs based on sequence-derived features with amino acid composition and fractal dimension. This resulting predictor is compared with three important existing methods DNA-Prot, iDNA-Prot and DNAbinder in the same datasets. On two benchmark datasets from DNA-Prot and iDNA-Prot, the average accuracies (ACC) achieve 82.07%, 84.91% respectively, and average Matthew's correlation coefficients (MCC) achieve 0.6085, 0.6981 respectively. The point to point comparisons demonstrate that our fractal approach shows some improvements.
引用
收藏
页码:156 / 163
页数:8
相关论文
共 49 条
  • [41] Identifying DNA-binding proteins using structural motifs and the electrostatic potential
    Shanahan, HP
    Garcia, MA
    Jones, S
    Thornton, JM
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (16) : 4732 - 4741
  • [42] Predicting DNA- and RNA-binding proteins from sequences with kernel methods
    Shao, Xiaojian
    Tian, Yingjie
    Wu, Lingyun
    Wang, Yong
    Jing, Ling
    Deng, Naiyang
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2009, 258 (02) : 289 - 293
  • [43] Self-similarity of complex networks
    Song, CM
    Havlin, S
    Makse, HA
    [J]. NATURE, 2005, 433 (7024) : 392 - 395
  • [44] Annotating nucleic acid-binding function based on protein structure
    Stawiski, EW
    Gregoret, LM
    Mandel-Gutfreund, Y
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2003, 326 (04) : 1065 - 1079
  • [45] BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences
    Wang, Liangjiang
    Brown, Susan J.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W243 - W248
  • [46] enDNA-Prot: Identification of DNA-Binding Proteins by Applying Ensemble Learning
    Xu, Ruifeng
    Zhou, Jiyun
    Liu, Bin
    Yao, Lin
    He, Yulan
    Zou, Quan
    Wang, Xiaolong
    [J]. BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [47] Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation
    Yang, Jian-Yi
    Peng, Zhen-Ling
    Yu, Zu-Guo
    Zhang, Rui-Jie
    Anh, Vo
    Wang, Desheng
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2009, 257 (04) : 618 - 626
  • [48] Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines
    Yu, Xiaojing
    Cao, Jianping
    Cai, Yudong
    Shi, Tieliu
    Li, Yixue
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2006, 240 (02) : 175 - 184
  • [49] Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses
    Yu, ZG
    Anh, V
    Lau, KS
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2004, 226 (03) : 341 - 348