gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence

被引:6
|
作者
Zhang, Yan-ping [1 ]
Wuyunqiqige [2 ,3 ]
Zheng, Wei [2 ,3 ]
Liu, Shuyi [4 ]
Zhao, Chunguang [5 ]
机构
[1] Hebei Univ Engn, Dept Math, Sch Sci, Handan 056038, Peoples R China
[2] Nankai Univ, Coll Math Sci, 94 Weijin Rd, Tianjin 300071, Peoples R China
[3] Nankai Univ, LPMC, 94 Weijin Rd, Tianjin 300071, Peoples R China
[4] Beijing Normal Univ, Expt High Sch, 14 Erlong Rd, Beijing 100051, Peoples R China
[5] Handan Coll, Dept Math & Phys, Inst Appl Stat, Handan 056005, Peoples R China
关键词
DNA-binding proteins; Graphical representation feature; PCA; SVM; AMINO-ACID-COMPOSITION; GRAPHICAL REPRESENTATION; WEB-SERVER; PHYSICOCHEMICAL PROPERTIES; SUBCELLULAR-LOCALIZATION; RECOMBINATION SPOTS; K-TUPLE; IDENTIFICATION; RECOGNITION; CLASSIFICATION;
D O I
10.1016/j.jtbi.2016.06.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA-binding proteins are the functional proteins in cells, which play an important role in various essential biological activities. An effective and fast computational method gDNA-Prot is proposed to predict DNA-binding proteins in this paper, which is a DNA-binding predictor that combines the support vector machine classifier and a novel kind of feature called graphical representation. The DNA-binding protein sequence information was described with the 20 probabilities of amino acids and the 23 new numerical graphical representation features of a protein sequence, based on 23 physicochemical properties of 20 amino acids. The Principal Components Analysis (PCA) was employed as feature selection method for removing the irrelevant features and reducing redundant features. The Sigmod function and Min-max normalization methods for PCA were applied to accelerate the training speed and obtain higher accuracy. Experiments demonstrated that the Principal Components Analysis with Sigmod function generated the best performance. The gDNA-Prot method was also compared with the DNAbinder, iDNA-Prot and DNA Prot. The results suggested that gDNA-Prot outperformed the DNAbinder and iDNA-Prot. Although the DNA-Prot outperformed gDNA-Prot, gDNA-Prot was faster and convenient to predict the DNA-binding proteins. Additionally, the proposed gNDA-Prot method is available at http://sourceforge.netiprojects/ gdnaprot. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8 / 16
页数:9
相关论文
共 30 条
  • [1] newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation
    Zhang, Yanping
    Xu, Jun
    Zheng, Wei
    Zhang, Chen
    Qiu, Xingye
    Chen, Ke
    Ruan, Jishou
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2014, 52 : 51 - 59
  • [2] Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information
    Ma, Xin
    Wu, Jiansheng
    Xue, Xiaoyun
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2013, 2013
  • [3] Identification of DNA-Binding Proteins by Multiple Kernel Support Vector Machine and Sequence Information
    Ding, Yijie
    Chen, Feng
    Guo, Xiaoyi
    Tang, Jijun
    Wu, Hongjie
    CURRENT PROTEOMICS, 2020, 17 (04) : 302 - 310
  • [4] Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation
    Xu, Ruifeng
    Zhou, Jiyun
    Wang, Hongpeng
    He, Yulan
    Wang, Xiaolong
    Liu, Bin
    BMC SYSTEMS BIOLOGY, 2015, 9
  • [5] Identification of DNA-Binding Proteins via Hypergraph Based Laplacian Support Vector Machine
    Qian, Yuqing
    Meng, Hao
    Lu, Weizhong
    Liao, Zhijun
    Ding, Yijie
    Wu, Hongjie
    CURRENT BIOINFORMATICS, 2022, 17 (01) : 108 - 117
  • [6] FermatS: A Novel Numerical Representation for Protein Sequence Comparison and DNA-binding Protein Identification
    Zhang, Yanping
    Gao, Ya
    Ni, Jianwei
    Chen, Pengcheng
    Wang, Xiaosheng
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2021, 24 (10) : 1746 - 1753
  • [7] Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence
    Cai, YD
    Lin, SL
    BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2003, 1648 (1-2): : 127 - 133
  • [8] Using hidden Markov models to predict DNA-binding proteins with sequence and structure information
    Hsu, Yi-Yu
    Chen, Wei-Jhih
    Chen, Shu-Hui
    Kao, Hung-Yu
    SOFT COMPUTING, 2014, 18 (12) : 2365 - 2376
  • [9] RF-SVM: Identification of DNA-binding proteins based on comprehensive feature representation methods and support vector machine
    Zhang, Yanping
    Ni, Jianwei
    Gao, Ya
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2022, 90 (02) : 395 - 404
  • [10] A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach
    Cai, Yudong
    He, ZhiSong
    Shi, Xiaohe
    Kong, Xiangying
    Gu, Lei
    Xie, Lu
    MOLECULES AND CELLS, 2010, 30 (02) : 99 - 105