PNImodeler: web server for inferring protein-binding nucleotides from sequence data

被引:15
作者
Im, Jinyong [1 ]
Tuvshinjargal, Narankhuu [1 ]
Park, Byungkyu [1 ]
Lee, Wook [1 ]
Huang, De-Shuang [2 ]
Han, Kyungsook [1 ]
机构
[1] Inha Univ, Dept Comp Sci & Engn, Inchon, South Korea
[2] Tongji Univ, Coll Elect & Informat Engn, Machine Learning & Syst Biol Lab, Shanghai 201804, Peoples R China
关键词
DNA; PREDICTION; SITES;
D O I
10.1186/1471-2164-16-S3-S6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Interactions between DNA and proteins are essential to many biological processes such as transcriptional regulation and DNA replication. With the increased availability of structures of protein-DNA complexes, several computational studies have been conducted to predict DNA binding sites in proteins. However, little attempt has been made to predict protein binding sites in DNA. Results: From an extensive analysis of protein-DNA complexes, we identified powerful features of DNA and protein sequences which can be used in predicting protein binding sites in DNA sequences. We developed two support vector machine (SVM) models that predict protein binding nucleotides from DNA and/or protein sequences. One SVM model that used DNA sequence data alone achieved a sensitivity of 73.4%, a specificity of 64.8%, an accuracy of 68.9% and a correlation coefficient of 0.382 with a test dataset that was not used in training. Another SVM model that used both DNA and protein sequences achieved a sensitivity of 67.6%, a specificity of 74.3%, an accuracy of 71.4% and a correlation coefficient of 0.418. Conclusions: Predicting binding sites in double-stranded DNAs is a more difficult task than predicting binding sites in single-stranded molecules. Our study showed that protein binding sites in double-stranded DNA molecules can be predicted with a comparable accuracy as those in single-stranded molecules. Our study also demonstrated that using both DNA and protein sequences resulted in a better prediction performance than using DNA sequence data alone. The SVM models and datasets constructed in this study are available at http://bclab.inha.ac.kr/pnimodeler.
引用
收藏
页数:8
相关论文
共 18 条
[1]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[2]   Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets [J].
Choi, Sungwook ;
Han, Kyungsook .
COMPUTERS IN BIOLOGY AND MEDICINE, 2013, 43 (11) :1687-1697
[3]   Characterization and prediction of the binding site in DNA-binding proteins: improvement of accuracy by combining residue composition, evolutionary conservation and structural parameters [J].
Dey, Sucharita ;
Pal, Arumay ;
Guharoy, Mainak ;
Sonavane, Shrihari ;
Chakrabarti, Pinak .
NUCLEIC ACIDS RESEARCH, 2012, 40 (15) :7150-7161
[4]   Identification of patterns in biological sequences at the ALGGEN server:: PROMO and MALGEN [J].
Farré, D ;
Roset, R ;
Huerta, M ;
Adsuara, JE ;
Roselló, L ;
Albà, MM ;
Messeguer, X .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3651-3653
[5]   Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method [J].
Ho, Shinn-Ying ;
Yu, Fu-Chieh ;
Chang, Chia-Yun ;
Huang, Hui-Ling .
BIOSYSTEMS, 2007, 90 (01) :234-241
[6]   Prediction of protein subcellular locations using fuzzy k-NN method [J].
Huang, Y ;
Li, Y .
BIOINFORMATICS, 2004, 20 (01) :21-28
[7]   CD-HIT Suite: a web server for clustering and comparing biological sequences [J].
Huang, Ying ;
Niu, Beifang ;
Gao, Ying ;
Fu, Limin ;
Li, Weizhong .
BIOINFORMATICS, 2010, 26 (05) :680-682
[8]   DP-Bind: a Web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins [J].
Hwang, Seungwoo ;
Gou, Zhenkun ;
Kuznetsov, Igor B. .
BIOINFORMATICS, 2007, 23 (05) :634-636
[9]   NPIDB: nucleic acid-protein interaction database [J].
Kirsanov, Dmitry D. ;
Zanegina, Olga N. ;
Aksianov, Evgeniy A. ;
Spirin, Sergei A. ;
Karyagina, Anna S. ;
Alexeevski, Andrei V. .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D517-D523
[10]   iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model [J].
Lin, Wei-Zhong ;
Fang, Jian-An ;
Xiao, Xuan ;
Chou, Kuo-Chen .
PLOS ONE, 2011, 6 (09)