gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence

被引：6

作者：

Zhang, Yan-ping ^{[1
]}

Wuyunqiqige ^{[2
,3
]}

Zheng, Wei ^{[2
,3
]}

Liu, Shuyi ^{[4
]}

Zhao, Chunguang ^{[5
]}

机构：

[1] Hebei Univ Engn, Dept Math, Sch Sci, Handan 056038, Peoples R China

[2] Nankai Univ, Coll Math Sci, 94 Weijin Rd, Tianjin 300071, Peoples R China

[3] Nankai Univ, LPMC, 94 Weijin Rd, Tianjin 300071, Peoples R China

[4] Beijing Normal Univ, Expt High Sch, 14 Erlong Rd, Beijing 100051, Peoples R China

[5] Handan Coll, Dept Math & Phys, Inst Appl Stat, Handan 056005, Peoples R China

来源：

JOURNAL OF THEORETICAL BIOLOGY | 2016年 / 406卷

关键词：

DNA-binding proteins; Graphical representation feature; PCA; SVM; AMINO-ACID-COMPOSITION; GRAPHICAL REPRESENTATION; WEB-SERVER; PHYSICOCHEMICAL PROPERTIES; SUBCELLULAR-LOCALIZATION; RECOMBINATION SPOTS; K-TUPLE; IDENTIFICATION; RECOGNITION; CLASSIFICATION;

D O I：

10.1016/j.jtbi.2016.06.002

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

DNA-binding proteins are the functional proteins in cells, which play an important role in various essential biological activities. An effective and fast computational method gDNA-Prot is proposed to predict DNA-binding proteins in this paper, which is a DNA-binding predictor that combines the support vector machine classifier and a novel kind of feature called graphical representation. The DNA-binding protein sequence information was described with the 20 probabilities of amino acids and the 23 new numerical graphical representation features of a protein sequence, based on 23 physicochemical properties of 20 amino acids. The Principal Components Analysis (PCA) was employed as feature selection method for removing the irrelevant features and reducing redundant features. The Sigmod function and Min-max normalization methods for PCA were applied to accelerate the training speed and obtain higher accuracy. Experiments demonstrated that the Principal Components Analysis with Sigmod function generated the best performance. The gDNA-Prot method was also compared with the DNAbinder, iDNA-Prot and DNA Prot. The results suggested that gDNA-Prot outperformed the DNAbinder and iDNA-Prot. Although the DNA-Prot outperformed gDNA-Prot, gDNA-Prot was faster and convenient to predict the DNA-binding proteins. Additionally, the proposed gNDA-Prot method is available at http://sourceforge.netiprojects/ gdnaprot. (C) 2016 Elsevier Ltd. All rights reserved.

引用

页码：8 / 16

页数：9

共 30 条

[21] MK-FSVM-SVDD: A Multiple Kernel-based Fuzzy SVM Model for Predicting DNA-binding Proteins via Support Vector Data Description
Zou, Yi
Wu, Hongjie
Guo, Xiaoyi
Peng, Li
Ding, Yijie
Tang, Jijun
Guo, Fei
CURRENT BIOINFORMATICS, 2021, 16 (02) : 274 - 283
[22] Predict prokaryotic proteins through detecting N-formylmethionine residues in protein sequences using support vector machine
Yang, Zheng Rong
BIOSYSTEMS, 2009, 97 (03) : 141 - 145
[23] Sequence-Based Prediction of Protein-Peptide Binding Sites Using Support Vector Machine
Taherzadeh, Ghazaleh
Yang, Yuedong
Zhang, Tuo
Liew, Alan Wee-Chung
Zhou, Yaoqi
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2016, 37 (13) : 1223 - 1229
[24] FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation
Zou, Yi
Ding, Yijie
Peng, Li
Zou, Quan
INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2022, 14 (02) : 372 - 384
[25] FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation
Yi Zou
Yijie Ding
Li Peng
Quan Zou
Interdisciplinary Sciences: Computational Life Sciences, 2022, 14 : 372 - 384
[26] SVM-Root: Identification of Root-Associated Proteins in Plants by Employing the Support Vector Machine with Sequence-Derived Features
Meher, Prabina Kumar
Hati, Siddhartha
Sahu, Tanmaya Kumar
Pradhan, Upendra
Gupta, Ajit
Rath, Surya Narayan
CURRENT BIOINFORMATICS, 2024, 19 (01) : 91 - 102
[27] Prediction of microRNA-binding residues in protein using a Laplacian support vector machine based on sequence information
Ma, Xin
Guo, Jing
Sun, Xiao
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2018, 16 (03)
[28] A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen-Shannon Divergence
Dang, Truong Khanh Linh
Meckbach, Cornelia
Tacke, Rebecca
Waack, Stephan
Gueltas, Mehmet
ENTROPY, 2016, 18 (10)
[29] DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool
Motion, Graham B.
Howden, Andrew J. M.
Huitema, Edgar
Jones, Susan
NUCLEIC ACIDS RESEARCH, 2015, 43 (22)
[30] DIFFERENTIAL INTERACTION OF THE DUAL ALPHA-TROPOMYOSIN/N5 ENHANCER WITH MULTIPLE DNA-BINDING PROTEINS - N5 IS A PUTATIVE NOVEL Z-ZIP DNA-BINDING PROTEIN
RUIZOPAZO, N
CLOIX, JF
HERRERA, VLM
CELLULAR & MOLECULAR BIOLOGY RESEARCH, 1994, 40 (04) : 265 - 272

← 1 2 3 →