A SVM-based Approach for Predicting DNA-binding Residues in Proteins from Amino Acid Sequences

被引:10
作者
Ma, Xin [1 ]
Wu, Jian-Sheng [1 ]
Liu, Hong-De [1 ]
Yang, Xi-Nan [1 ]
Xie, Jian-Ming [1 ]
Sun, Xiao [1 ]
机构
[1] Southeast Univ, Sch Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing 210096, Peoples R China
来源
2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS | 2009年
关键词
DNA-binding residues; Support vector machine (SVM); position specific scoring matrices (PSSMs); SITES;
D O I
10.1109/IJCBS.2009.33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Protein-DNA interactions are vitally important in a wide range of biological processes such as gene regulation and DNA replication and repair. We predict DNA-binding residues in proteins from amino acid sequences by support vector machine (SVM) with a novel hybrid feature which incorporates evolutionary information of amino acid sequences and four physical-chemical properties, including the side chain pKa value, hydrophobicity index, molecular mass and lone electron pairs of amino acids. The classifier achieves 79.12% total accuracy with 74.19% sensitivity and 79.20% specificity, respectively. Moreover, an alternative classifier using random forest (RF) is also constructed. Further analysis proves that the hybrid feature shows obvious contribution to our excellent prediction performance, and the evolutionary information contributes most to the prediction improvement.
引用
收藏
页码:225 / 229
页数:5
相关论文
共 17 条
[1]   PSSM-based prediction of DNA binding sites in proteins [J].
Ahmad, S ;
Sarai, A .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]   Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   Designing transcription factor architectures for drug discovery [J].
Blancafort, P ;
Segal, DJ ;
Barbas, CF .
MOLECULAR PHARMACOLOGY, 2004, 66 (06) :1361-1371
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Dimitriadou E., 2006, Misc Functions of the Department of Statistics (e1071)
[9]  
EGAN JP, 1975, SERIES COGITATION PE
[10]   Transcription factor therapeutics: Long-shot or lodestone [J].
Ghosh, D ;
Papavassiliou, AG .
CURRENT MEDICINAL CHEMISTRY, 2005, 12 (06) :691-701