Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences

被引:10
作者
Wang, Wei [1 ,2 ]
Sun, Lin [1 ]
Zhang, Shiguang [1 ]
Zhang, Hongjun [3 ]
Shi, Jinling [4 ]
Xu, Tianhe [1 ]
Li, Keliang [1 ]
机构
[1] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Henan Province, Peoples R China
[2] Engn Technol Res Ctr Comp Intelligence & Data Min, Lab Computat Intelligence & Informat Proc, Xinxiang 453007, Henan Province, Peoples R China
[3] Anyang Univ, Sch Aviat Engn, Anyang 455000, Henan Province, Peoples R China
[4] Xuchang Univ, Sch Int Educ, Xuchang 461000, Henan Province, Peoples R China
来源
BMC BIOINFORMATICS | 2017年 / 18卷
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
SSBs (Single-stranded DNA-binding proteins); DSBs (Double-stranded DNA-binding proteins); Binding specificity; Protein sequence; SUBCELLULAR-LOCALIZATION; OB-FOLD; EVOLUTIONARY; RECOGNITION; SPECIFICITY; FEATURES; SITES; IDENTIFICATION; INTERFACE; DOMAINS;
D O I
10.1186/s12859-017-1715-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs. Results: Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing. Conclusions: Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.
引用
收藏
页数:10
相关论文
共 57 条
  • [1] Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition
    Afridi, Tariq Habib
    Khan, Asifullah
    Lee, Yeon Soo
    [J]. AMINO ACIDS, 2012, 42 (04) : 1443 - 1454
  • [2] PSSM-based prediction of DNA binding sites in proteins
    Ahmad, S
    Sarai, A
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] Role of the Single-Stranded DNA-Binding Protein SsbB in Pneumococcal Transformation: Maintenance of a Reservoir for Genetic Plasticity
    Attaiech, Laetitia
    Olivier, Audrey
    Mortier-Barriere, Isabelle
    Soulet, Anne-Lise
    Granadel, Chantal
    Martin, Bernard
    Polard, Patrice
    Claverys, Jean-Pierre
    [J]. PLOS GENETICS, 2011, 7 (06):
  • [5] Automatic discovery of cross-family sequence features associated with protein function
    Brameier, M
    Haan, J
    Krings, A
    MacCallum, RM
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [6] Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition
    Cai, YD
    Doig, AJ
    [J]. BIOINFORMATICS, 2004, 20 (08) : 1292 - 1300
  • [7] Single-Stranded DNA Binding Proteins Unwind the Newly Synthesized Double-Stranded DNA of Model Miniforks
    Delagoutte, Emmanuelle
    Heneman-Masurel, Amelie
    Baldacci, Giuseppe
    [J]. BIOCHEMISTRY, 2011, 50 (06) : 932 - 944
  • [8] Characterization and prediction of the binding site in DNA-binding proteins: improvement of accuracy by combining residue composition, evolutionary conservation and structural parameters
    Dey, Sucharita
    Pal, Arumay
    Guharoy, Mainak
    Sonavane, Shrihari
    Chakrabarti, Pinak
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (15) : 7150 - 7161
  • [9] Single-Stranded DNA-Binding Proteins: Multiple Domains for Multiple Functions
    Dickey, Thayne H.
    Altschuler, Sarah E.
    Wuttke, Deborah S.
    [J]. STRUCTURE, 2013, 21 (07) : 1074 - 1084
  • [10] Edso JR, 2011, GENOME INTEGRITY, V2, P1