gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence

被引：6

作者：

Zhang, Yan-ping ^{[1
]}

Wuyunqiqige ^{[2
,3
]}

Zheng, Wei ^{[2
,3
]}

Liu, Shuyi ^{[4
]}

Zhao, Chunguang ^{[5
]}

机构：

[1] Hebei Univ Engn, Dept Math, Sch Sci, Handan 056038, Peoples R China

[2] Nankai Univ, Coll Math Sci, 94 Weijin Rd, Tianjin 300071, Peoples R China

[3] Nankai Univ, LPMC, 94 Weijin Rd, Tianjin 300071, Peoples R China

[4] Beijing Normal Univ, Expt High Sch, 14 Erlong Rd, Beijing 100051, Peoples R China

[5] Handan Coll, Dept Math & Phys, Inst Appl Stat, Handan 056005, Peoples R China

来源：

JOURNAL OF THEORETICAL BIOLOGY | 2016年 / 406卷

关键词：

DNA-binding proteins; Graphical representation feature; PCA; SVM; AMINO-ACID-COMPOSITION; GRAPHICAL REPRESENTATION; WEB-SERVER; PHYSICOCHEMICAL PROPERTIES; SUBCELLULAR-LOCALIZATION; RECOMBINATION SPOTS; K-TUPLE; IDENTIFICATION; RECOGNITION; CLASSIFICATION;

D O I：

10.1016/j.jtbi.2016.06.002

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

DNA-binding proteins are the functional proteins in cells, which play an important role in various essential biological activities. An effective and fast computational method gDNA-Prot is proposed to predict DNA-binding proteins in this paper, which is a DNA-binding predictor that combines the support vector machine classifier and a novel kind of feature called graphical representation. The DNA-binding protein sequence information was described with the 20 probabilities of amino acids and the 23 new numerical graphical representation features of a protein sequence, based on 23 physicochemical properties of 20 amino acids. The Principal Components Analysis (PCA) was employed as feature selection method for removing the irrelevant features and reducing redundant features. The Sigmod function and Min-max normalization methods for PCA were applied to accelerate the training speed and obtain higher accuracy. Experiments demonstrated that the Principal Components Analysis with Sigmod function generated the best performance. The gDNA-Prot method was also compared with the DNAbinder, iDNA-Prot and DNA Prot. The results suggested that gDNA-Prot outperformed the DNAbinder and iDNA-Prot. Although the DNA-Prot outperformed gDNA-Prot, gDNA-Prot was faster and convenient to predict the DNA-binding proteins. Additionally, the proposed gNDA-Prot method is available at http://sourceforge.netiprojects/ gdnaprot. (C) 2016 Elsevier Ltd. All rights reserved.

引用

页码：8 / 16

页数：9

共 30 条

[1] newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation
Zhang, Yanping
Xu, Jun
Zheng, Wei
Zhang, Chen
Qiu, Xingye
Chen, Ke
Ruan, Jishou
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2014, 52 : 51 - 59
[2] Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information
Ma, Xin
Wu, Jiansheng
Xue, Xiaoyun
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2013, 2013
[3] Identification of DNA-Binding Proteins by Multiple Kernel Support Vector Machine and Sequence Information
Ding, Yijie
Chen, Feng
Guo, Xiaoyi
Tang, Jijun
Wu, Hongjie
CURRENT PROTEOMICS, 2020, 17 (04) : 302 - 310
[4] Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation
Xu, Ruifeng
Zhou, Jiyun
Wang, Hongpeng
He, Yulan
Wang, Xiaolong
Liu, Bin
BMC SYSTEMS BIOLOGY, 2015, 9
[5] Identification of DNA-Binding Proteins via Hypergraph Based Laplacian Support Vector Machine
Qian, Yuqing
Meng, Hao
Lu, Weizhong
Liao, Zhijun
Ding, Yijie
Wu, Hongjie
CURRENT BIOINFORMATICS, 2022, 17 (01) : 108 - 117
[6] FermatS: A Novel Numerical Representation for Protein Sequence Comparison and DNA-binding Protein Identification
Zhang, Yanping
Gao, Ya
Ni, Jianwei
Chen, Pengcheng
Wang, Xiaosheng
COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2021, 24 (10) : 1746 - 1753
[7] Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence
Cai, YD
Lin, SL
BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2003, 1648 (1-2): : 127 - 133
[8] Using hidden Markov models to predict DNA-binding proteins with sequence and structure information
Hsu, Yi-Yu
Chen, Wei-Jhih
Chen, Shu-Hui
Kao, Hung-Yu
SOFT COMPUTING, 2014, 18 (12) : 2365 - 2376
[9] RF-SVM: Identification of DNA-binding proteins based on comprehensive feature representation methods and support vector machine
Zhang, Yanping
Ni, Jianwei
Gao, Ya
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2022, 90 (02) : 395 - 404
[10] A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach
Cai, Yudong
He, ZhiSong
Shi, Xiaohe
Kong, Xiangying
Gu, Lei
Xie, Lu
MOLECULES AND CELLS, 2010, 30 (02) : 99 - 105

← 1 2 3 →