Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm

被引:30
作者
Zhang, Jian [1 ]
Gao, Bo [1 ]
Chai, Haiting [1 ]
Ma, Zhiqiang [1 ]
Yang, Guifu [1 ,2 ]
机构
[1] Northeast Normal Univ, Sch Comp Sci & Informat Technol, Changchun 130117, Peoples R China
[2] Northeast Normal Univ, Off Informatizat Management & Planning, Changchun 130117, Peoples R China
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
DNA-binding proteins; Binary firefly algorithm; Feature selection; Parameters optimization; FEATURE-SELECTION; FREE-ENERGY; PREDICTION; SEQUENCE; RECOGNITION; SPECIFICITY; DESIGN;
D O I
10.1186/s12859-016-1201-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: DNA-binding proteins (DBPs) play fundamental roles in many biological processes. Therefore, the developing of effective computational tools for identifying DBPs is becoming highly desirable. Results: In this study, we proposed an accurate method for the prediction of DBPs. Firstly, we focused on the challenge of improving DBP prediction accuracy with information solely from the sequence. Secondly, we used multiple informative features to encode the protein. These features included evolutionary conservation profile, secondary structure motifs, and physicochemical properties. Thirdly, we introduced a novel improved Binary Firefly Algorithm (BFA) to remove redundant or noisy features as well as select optimal parameters for the classifier. The experimental results of our predictor on two benchmark datasets outperformed many state-of-the-art predictors, which revealed the effectiveness of our method. The promising prediction performance on a new-compiled independent testing dataset from PDB and a large-scale dataset from UniProt proved the good generalization ability of our method. In addition, the BFA forged in this research would be of great potential in practical applications in optimization fields, especially in feature selection problems. Conclusions: A highly accurate method was proposed for the identification of DBPs. A user-friendly web-server named iDbP (identification of DNA-binding Proteins) was constructed and provided for academic use.
引用
收藏
页数:12
相关论文
共 53 条
  • [1] Moment-based prediction of DNA-binding proteins
    Ahmad, S
    Sarai, A
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (01) : 65 - 71
  • [2] Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information
    Ahmad, S
    Gromiha, MM
    Sarai, A
    [J]. BIOINFORMATICS, 2004, 20 (04) : 477 - 486
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] [Anonymous], 2013, IOSR J COMPUT ENG, DOI DOI 10.9790/0661-1117578
  • [5] DNA secondary structures: stability and function of G-quadruplex structures
    Bochman, Matthew L.
    Paeschke, Katrin
    Zakian, Virginia A.
    [J]. NATURE REVIEWS GENETICS, 2012, 13 (11) : 770 - 780
  • [6] Statistical analysis and prediction of protein-protein interfaces
    Bordner, AJ
    Abagyan, R
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 60 (03) : 353 - 366
  • [7] Relation between amino acid composition and cellular location of proteins
    Cedano, J
    Aloy, P
    PerezPons, JA
    Querol, E
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) : 594 - 600
  • [8] Parsing the free energy of anthracycline antibiotic binding to DNA
    Chaires, JB
    Satyanarayana, S
    Suh, D
    Fokt, I
    Przewloka, T
    Priebe, W
    [J]. BIOCHEMISTRY, 1996, 35 (07) : 2047 - 2053
  • [9] A thermodynamic signature for drug-DNA binding mode
    Chaires, Jonathan B.
    [J]. ARCHIVES OF BIOCHEMISTRY AND BIOPHYSICS, 2006, 453 (01) : 26 - 31
  • [10] A survey on feature selection methods
    Chandrashekar, Girish
    Sahin, Ferat
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) : 16 - 28