Identification of DNA-protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information

被引:33
作者
Shen, Cong [1 ,2 ]
Ding, Yijie [1 ,2 ]
Tang, Jijun [1 ,2 ,4 ]
Song, Jian [3 ]
Guo, Fei [1 ,2 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China
[2] Tianjin Univ, Inst Computat Biol, Tianjin 300350, Peoples R China
[3] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300350, Peoples R China
[4] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29208 USA
基金
美国国家科学基金会;
关键词
DNA-protein binding sites; ensemble classifier; feature extraction; random sub-sampling; sparse representation model; FACE RECOGNITION; PREDICTION; RESIDUES;
D O I
10.3390/molecules22122079
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
DNA-protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations have been elucidated to improve the accuracy of the DNA-protein binding sites prediction. Nevertheless, structure-based approaches are limited under the condition without 3D information, and the predictive validity is still refinable. In this essay, we address a kind of competitive method called Multi-scale Local Average Blocks (MLAB) algorithm to solve this issue. Different from structure-based routes, MLAB exploits a strategy that not only extracts local evolutionary information from primary sequences, but also using predicts solvent accessibility. Moreover, the construction about predictors of DNA-protein binding sites wields an ensemble weighted sparse representation model with random under-sampling. To evaluate the performance of MLAB, we conduct comprehensive experiments of DNA-protein binding sites prediction. MLAB gives MCC of 0.392, 0.315, 0.439 and 0.245 on PDNA-543, PDNA-41, PDNA-316 and PDNA-52 datasets, respectively. It shows that MLAB gains advantages by comparing with other outstanding methods. MCC for our method is increased by at least 0.053, 0.015 and 0.064 on PDNA-543, PDNA-41 and PDNA-316 datasets, respectively.
引用
收藏
页数:20
相关论文
共 65 条
[1]   PSSM-based prediction of DNA binding sites in proteins [J].
Ahmad, S ;
Sarai, A .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]   Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[3]  
Ahmad S., 2003, STRUCT FUNCT GENET, V50, P629, DOI [10.1002/prot.1032812577269, DOI 10.1002/PROT.10328]
[4]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[5]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
Buenrostro JD, 2013, NAT METHODS, V10, P1213, DOI [10.1038/NMETH.2688, 10.1038/nmeth.2688]
[8]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[9]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[10]   Predicting protein lysine phosphoglycerylation sites by hybridizing many sequence based features [J].
Chen, Qing-Yun ;
Tang, Jijun ;
Du, Pu-Feng .
MOLECULAR BIOSYSTEMS, 2017, 13 (05) :874-882