TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning

被引:46
作者
Hu, Jun [1 ]
Zhou, Xiao-Gen [1 ]
Zhu, Yi-Heng [2 ]
Yu, Dong-Jun [2 ]
Zhang, Gui-Jun [1 ]
机构
[1] Zhejiang Univ Technol, Coll Informat Engn, Hangzhou 310023, Peoples R China
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
基金
中国国家自然科学基金;
关键词
DNA-binding protein prediction; sequence-based; differential evolution; feature selection; support vector machine; AMINO-ACID-COMPOSITION; WEB SERVER; EVOLUTIONARY INFORMATION; DIFFERENTIAL EVOLUTION; IDENTIFICATION; BIOINFORMATICS; CLASSIFIER; SELECTION; PSEAAC; SITES;
D O I
10.1109/TCBB.2019.2893634
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Accurately identifying DNA-binding proteins (DBPs) from protein sequence information is an important but challenging task for protein function annotations. In this paper, we establish a novel computational method, named TargetDBP, for accurately targeting DBPs from primary sequences. In TargetDBP, four single-view features, i.e., AAC (Amino Acid Composition), PsePSSM (Pseudo Position-Specific Scoring Matrix), PsePRSA (Pseudo Predicted Relative Solvent Accessibility), and PsePPDBS (Pseudo Predicted Probabilities of DNA-Binding Sites), are first extracted to represent different base features, respectively. Second, differential evolution algorithm is employed to learn the weights of four base features. Using the learned weights, we weightedly combine these base features to form the original super feature. An excellent subset of the super feature is then selected by using a suitable feature selection algorithm SVM-REF+CBR (Support Vector Machine Recursive Feature Elimination with Correlation Bias Reduction). Finally, the prediction model is learned via using support vector machine on the selected feature subset. We also construct a new gold-standard and non-redundant benchmark dataset from PDB database to evaluate and compare the proposed TargetDBP with other existing predictors. On this new dataset, TargetDBP can achieve higher performance than other state-of-the-art predictors. The TargetDBP web server and datasets are freely available at >http://csbio.njust.edu.cn/bioinf/targetdbp/ for academic use.
引用
收藏
页码:1419 / 1429
页数:11
相关论文
共 64 条
[1]  
[Anonymous], 2014, PLOS ONE
[2]   Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions [J].
Bhardwaj, Nitin ;
Lu, Hui .
FEBS LETTERS, 2007, 581 (05) :1058-1066
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[5]   Unsupervised Learning and Pattern Recognition of Biological Data Structures with Density Functional Theory and Machine Learning [J].
Chen, Chien-Chang ;
Juan, Hung-Hui ;
Tsai, Meng-Yuan ;
Lu, Henry Horng-Shing .
SCIENTIFIC REPORTS, 2018, 8
[6]   Prediction of protein crystallization using collocation of amino acid pairs [J].
Chen, Ke ;
Kurgan, Lukasz ;
Rahbari, Mandana .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2007, 355 (03) :764-769
[7]   iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Lin, Hao ;
Chou, Kuo-Chen .
NUCLEIC ACIDS RESEARCH, 2013, 41 (06) :e68
[8]   pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC [J].
Cheng, Xiang ;
Xiao, Xuan ;
Chou, Kuo-Chen .
JOURNAL OF THEORETICAL BIOLOGY, 2018, 458 :92-102
[9]   iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals [J].
Cheng, Xiang ;
Zhao, Shu-Guang ;
Xiao, Xuan ;
Chou, Kuo-Chen .
BIOINFORMATICS, 2017, 33 (03) :341-346
[10]   Prediction of protein cellular attributes using pseudo-amino acid composition [J].
Chou, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :246-255