Identification of DNA-protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information

被引：33

作者：

Shen, Cong ^{[1
,2
]}

Ding, Yijie ^{[1
,2
]}

Tang, Jijun ^{[1
,2
,4
]}

Song, Jian ^{[3
]}

Guo, Fei ^{[1
,2
]}

机构：

[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China

[2] Tianjin Univ, Inst Computat Biol, Tianjin 300350, Peoples R China

[3] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300350, Peoples R China

[4] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29208 USA

来源：

MOLECULES | 2017年 / 22卷 / 12期

基金：

美国国家科学基金会;

关键词：

DNA-protein binding sites; ensemble classifier; feature extraction; random sub-sampling; sparse representation model; FACE RECOGNITION; PREDICTION; RESIDUES;

D O I：

10.3390/molecules22122079

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

DNA-protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations have been elucidated to improve the accuracy of the DNA-protein binding sites prediction. Nevertheless, structure-based approaches are limited under the condition without 3D information, and the predictive validity is still refinable. In this essay, we address a kind of competitive method called Multi-scale Local Average Blocks (MLAB) algorithm to solve this issue. Different from structure-based routes, MLAB exploits a strategy that not only extracts local evolutionary information from primary sequences, but also using predicts solvent accessibility. Moreover, the construction about predictors of DNA-protein binding sites wields an ensemble weighted sparse representation model with random under-sampling. To evaluate the performance of MLAB, we conduct comprehensive experiments of DNA-protein binding sites prediction. MLAB gives MCC of 0.392, 0.315, 0.439 and 0.245 on PDNA-543, PDNA-41, PDNA-316 and PDNA-52 datasets, respectively. It shows that MLAB gains advantages by comparing with other outstanding methods. MCC for our method is increased by at least 0.053, 0.015 and 0.064 on PDNA-543, PDNA-41 and PDNA-316 datasets, respectively.

引用

页数：20

共 65 条

[1] PSSM-based prediction of DNA binding sites in proteins [J].

Ahmad, S ;

Sarai, A .

BMC BIOINFORMATICS, 2005, 6 (1)

[2] Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].

Ahmad, S ;

Gromiha, MM ;

Sarai, A .

BIOINFORMATICS, 2004, 20 (04) :477-486

[3]

Ahmad S., 2003, STRUCT FUNCT GENET, V50, P629, DOI [10.1002/prot.1032812577269, DOI 10.1002/PROT.10328]

[4] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].

Alipanahi, Babak ;

Delong, Andrew ;

Weirauch, Matthew T. ;

Frey, Brendan J. .

NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+

[5] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].

Altschul, SF ;

Madden, TL ;

Schaffer, AA ;

Zhang, JH ;

Zhang, Z ;

Miller, W ;

Lipman, DJ .

NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402

[6] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[7]

Buenrostro JD, 2013, NAT METHODS, V10, P1213, DOI [10.1038/NMETH.2688, 10.1038/nmeth.2688]

[8] BLAST plus : architecture and applications [J].

Camacho, Christiam ;

Coulouris, George ;

Avagyan, Vahram ;

Ma, Ning ;

Papadopoulos, Jason ;

Bealer, Kevin ;

Madden, Thomas L. .

BMC BIOINFORMATICS, 2009, 10

[9] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[10] Predicting protein lysine phosphoglycerylation sites by hybridizing many sequence based features [J].

Chen, Qing-Yun ;

Tang, Jijun ;

Du, Pu-Feng .

MOLECULAR BIOSYSTEMS, 2017, 13 (05) :874-882

← 1 2 3 4 5 6 7 →