Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information

被引:291
作者
Ahmad, S [1 ]
Gromiha, MM
Sarai, A
机构
[1] Kyushu Inst Technol, Dept Biochem Sci & Engn, Iizuka, Fukuoka 8208502, Japan
[2] Jamia Millia Islamia, Dept Biosci, New Delhi 110025, India
[3] AIST, Computat Biol Res Ctr, CBRC, Koto Ku, Tokyo 1350064, Japan
关键词
D O I
10.1093/bioinformatics/btg432
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Though vitally important to cell function, the mechanism of protein-DNA binding has not yet been completely understood. We therefore analysed the relationship between DNA binding and protein sequence composition, solvent accessibility and secondary structure. Using non-redundant databases of transcription factors and protein-DNA complexes, neural network models were developed to utilize the information present in this relationship to predict DNA-binding proteins and their binding residues. Results: Sequence composition was found to provide sufficient information to predict the probability of its binding to DNA with nearly 69% sensitivity at 64% accuracy for the considered proteins; sequence neighbourhood and solvent accessibility information were sufficient to make binding site predictions with 40% sensitivity at 79% accuracy. Detailed analysis of binding residues shows that some three- and five-residue segments frequently bind to DNA and that solvent accessibility plays a major role in binding. Although, binding behaviour was not associated with any particular secondary structure, there were interesting exceptions at the residue level. Over-representation of some residues in the binding sites was largely lost at the total sequence level, but a different kind of compositional preference was observed in DNA-binding proteins.
引用
收藏
页码:477 / 486
页数:10
相关论文
共 20 条
[1]   Real value prediction of solvent accessibility from amino acid sequence [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 50 (04) :629-635
[2]   NETASA: neural network based prediction of solvent accessibility [J].
Ahmad, S ;
Gromiha, MM .
BIOINFORMATICS, 2002, 18 (06) :819-824
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[5]  
Cuff JA, 2000, PROTEINS, V40, P502, DOI 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO
[6]  
2-Q
[7]   Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: Application of long-range order to folding rate prediction [J].
Gromiha, MM ;
Selvaraj, S .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 310 (01) :27-32
[8]   Role of structural and sequence information in the prediction of protein stability changes: comparison between buried and partially buried mutations [J].
Gromiha, MM ;
Oobatake, M ;
Kono, H ;
Uedaira, H ;
Sarai, A .
PROTEIN ENGINEERING, 1999, 12 (07) :549-555
[9]   Removing near-neighbour redundancy from large protein sequence collections [J].
Holm, L ;
Sander, C .
BIOINFORMATICS, 1998, 14 (05) :423-429
[10]   DICTIONARY OF PROTEIN SECONDARY STRUCTURE - PATTERN-RECOGNITION OF HYDROGEN-BONDED AND GEOMETRICAL FEATURES [J].
KABSCH, W ;
SANDER, C .
BIOPOLYMERS, 1983, 22 (12) :2577-2637