gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence

被引:6
|
作者
Zhang, Yan-ping [1 ]
Wuyunqiqige [2 ,3 ]
Zheng, Wei [2 ,3 ]
Liu, Shuyi [4 ]
Zhao, Chunguang [5 ]
机构
[1] Hebei Univ Engn, Dept Math, Sch Sci, Handan 056038, Peoples R China
[2] Nankai Univ, Coll Math Sci, 94 Weijin Rd, Tianjin 300071, Peoples R China
[3] Nankai Univ, LPMC, 94 Weijin Rd, Tianjin 300071, Peoples R China
[4] Beijing Normal Univ, Expt High Sch, 14 Erlong Rd, Beijing 100051, Peoples R China
[5] Handan Coll, Dept Math & Phys, Inst Appl Stat, Handan 056005, Peoples R China
关键词
DNA-binding proteins; Graphical representation feature; PCA; SVM; AMINO-ACID-COMPOSITION; GRAPHICAL REPRESENTATION; WEB-SERVER; PHYSICOCHEMICAL PROPERTIES; SUBCELLULAR-LOCALIZATION; RECOMBINATION SPOTS; K-TUPLE; IDENTIFICATION; RECOGNITION; CLASSIFICATION;
D O I
10.1016/j.jtbi.2016.06.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA-binding proteins are the functional proteins in cells, which play an important role in various essential biological activities. An effective and fast computational method gDNA-Prot is proposed to predict DNA-binding proteins in this paper, which is a DNA-binding predictor that combines the support vector machine classifier and a novel kind of feature called graphical representation. The DNA-binding protein sequence information was described with the 20 probabilities of amino acids and the 23 new numerical graphical representation features of a protein sequence, based on 23 physicochemical properties of 20 amino acids. The Principal Components Analysis (PCA) was employed as feature selection method for removing the irrelevant features and reducing redundant features. The Sigmod function and Min-max normalization methods for PCA were applied to accelerate the training speed and obtain higher accuracy. Experiments demonstrated that the Principal Components Analysis with Sigmod function generated the best performance. The gDNA-Prot method was also compared with the DNAbinder, iDNA-Prot and DNA Prot. The results suggested that gDNA-Prot outperformed the DNAbinder and iDNA-Prot. Although the DNA-Prot outperformed gDNA-Prot, gDNA-Prot was faster and convenient to predict the DNA-binding proteins. Additionally, the proposed gNDA-Prot method is available at http://sourceforge.netiprojects/ gdnaprot. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8 / 16
页数:9
相关论文
共 30 条
  • [21] MK-FSVM-SVDD: A Multiple Kernel-based Fuzzy SVM Model for Predicting DNA-binding Proteins via Support Vector Data Description
    Zou, Yi
    Wu, Hongjie
    Guo, Xiaoyi
    Peng, Li
    Ding, Yijie
    Tang, Jijun
    Guo, Fei
    CURRENT BIOINFORMATICS, 2021, 16 (02) : 274 - 283
  • [22] Predict prokaryotic proteins through detecting N-formylmethionine residues in protein sequences using support vector machine
    Yang, Zheng Rong
    BIOSYSTEMS, 2009, 97 (03) : 141 - 145
  • [23] Sequence-Based Prediction of Protein-Peptide Binding Sites Using Support Vector Machine
    Taherzadeh, Ghazaleh
    Yang, Yuedong
    Zhang, Tuo
    Liew, Alan Wee-Chung
    Zhou, Yaoqi
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2016, 37 (13) : 1223 - 1229
  • [24] FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation
    Zou, Yi
    Ding, Yijie
    Peng, Li
    Zou, Quan
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2022, 14 (02) : 372 - 384
  • [25] FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation
    Yi Zou
    Yijie Ding
    Li Peng
    Quan Zou
    Interdisciplinary Sciences: Computational Life Sciences, 2022, 14 : 372 - 384
  • [26] SVM-Root: Identification of Root-Associated Proteins in Plants by Employing the Support Vector Machine with Sequence-Derived Features
    Meher, Prabina Kumar
    Hati, Siddhartha
    Sahu, Tanmaya Kumar
    Pradhan, Upendra
    Gupta, Ajit
    Rath, Surya Narayan
    CURRENT BIOINFORMATICS, 2024, 19 (01) : 91 - 102
  • [27] Prediction of microRNA-binding residues in protein using a Laplacian support vector machine based on sequence information
    Ma, Xin
    Guo, Jing
    Sun, Xiao
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2018, 16 (03)
  • [28] A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen-Shannon Divergence
    Dang, Truong Khanh Linh
    Meckbach, Cornelia
    Tacke, Rebecca
    Waack, Stephan
    Gueltas, Mehmet
    ENTROPY, 2016, 18 (10)
  • [29] DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool
    Motion, Graham B.
    Howden, Andrew J. M.
    Huitema, Edgar
    Jones, Susan
    NUCLEIC ACIDS RESEARCH, 2015, 43 (22)
  • [30] DIFFERENTIAL INTERACTION OF THE DUAL ALPHA-TROPOMYOSIN/N5 ENHANCER WITH MULTIPLE DNA-BINDING PROTEINS - N5 IS A PUTATIVE NOVEL Z-ZIP DNA-BINDING PROTEIN
    RUIZOPAZO, N
    CLOIX, JF
    HERRERA, VLM
    CELLULAR & MOLECULAR BIOLOGY RESEARCH, 1994, 40 (04) : 265 - 272