EL_LSTM: Prediction of DNA-Binding Residue from Protein Sequence by Combining Long Short-Term Memory and Ensemble Learning

被引:23
作者
Zhou, Jiyun [1 ,2 ]
Lu, Qin [2 ]
Xu, Ruifeng [1 ,3 ]
Gui, Lin [1 ]
Wang, Hongpeng [1 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hung Hom, Hong Kong, Peoples R China
[3] Harbin Inst Technol, Shenzhen Grad Sch, HIT Campus Shenzhen Univ Town, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Protein sequence; Neural networks; DNA; Support vector machines; Machine learning; Protein-DNA interaction; DNA-binding residue; LSTM; ensemble learning; relationship; bi-grams; EFFICIENT PREDICTION; SITES; TRANSCRIPTION; CONSERVATION;
D O I
10.1109/TCBB.2018.2858806
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Most past works for DNA-binding residue prediction did not consider the relationships between residues. In this paper, we propose a novel approach for DNA-binding residue prediction, referred to as EL_LSTM, which includes two main components. The first component is the Long Short-Term Memory (LSTM), which learns pairwise relationships between residues through a bi-gram model and then learns feature vectors for all residues. The second component is an ensemble learning based classifier introduced to tackle the data imbalance problem in binding residue predictions. We use a variant of the bagging strategy in ensemble learning to achieve balanced samples. Evaluations on PDNA-224 and DBP-123 show that adding feature relationships performs better than classifiers without feature relationships by at least 0.028 on MCC, 1.18 percent on ST and 0.012 on AUC. This indicates the usefulness of feature relationships for DNA-binding residue predictions. Evaluation on using ensemble learning indicates that the improvement can reach at least 0.021 on MCC, 1.32 percent on ST, and 0.018 on AUC compared to the use of a single LSTM classifier. Comparisons with the state-of-the-art predictors show that our proposed EL_LSTM outperforms them significantly. Further feature analysis validates the effectiveness of LSTM for the prediction of DNA-binding residues.
引用
收藏
页码:124 / 135
页数:12
相关论文
共 47 条
[1]   PSSM-based prediction of DNA binding sites in proteins [J].
Ahmad, S ;
Sarai, A .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]   Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[3]   PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES [J].
BERNSTEIN, FC ;
KOETZLE, TF ;
WILLIAMS, GJB ;
MEYER, EF ;
BRICE, MD ;
RODGERS, JR ;
KENNARD, O ;
SHIMANOUCHI, T ;
TASUMI, M .
EUROPEAN JOURNAL OF BIOCHEMISTRY, 1977, 80 (02) :319-324
[4]   Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions [J].
Bhardwaj, Nitin ;
Lu, Hui .
FEBS LETTERS, 2007, 581 (05) :1058-1066
[5]   Structure based prediction of binding residues on DNA-binding proteins [J].
Bhardwaj, Nitin ;
Langlois, Robert E. ;
Hui, Guijun Zhao .
2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, :2611-2614
[6]  
Bowman Samuel R., 2015, EMNLP, P632, DOI 10.18653/v1/D15-1075
[7]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[8]   Rescuing the function of mutant p53 [J].
Bullock, AN ;
Fersht, A .
NATURE REVIEWS CANCER, 2001, 1 (01) :68-76
[9]   DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry [J].
Chen, Yao Chi ;
Wright, Jon D. ;
Lim, Carmay .
NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) :W249-W256
[10]  
Demsar J, 2006, J MACH LEARN RES, V7, P1