Identifying tandem Ankyrin repeats in protein structures

被引:21
作者
Chakrabarty B. [1 ]
Parekh N. [1 ]
机构
[1] International Institute of Information Technology, Centre for Computational Natural Sciences and Bioinformatics, Hyderabad
关键词
Ankyrin repeat; Graph theory; Protein contact network;
D O I
10.1186/s12859-014-0440-9
中图分类号
学科分类号
摘要
Background: Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat. Results: It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at. Conclusions: AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family. © 2014 Chakrabarty and Parekh; licensee BioMed Central.
引用
收藏
相关论文
共 51 条
[1]  
Andrade M.A., Bork P., HEAT repeats in the Huntington's disease protein, Nat Genet, 11, pp. 115-116, (1995)
[2]  
Kajava A.V., Review: proteins with repeated sequence - structural prediction and modeling, J Struct Biol, 134, pp. 132-144, (2001)
[3]  
Kajava A.V., Tandem repeats in proteins: from sequence to structure, J Struct Biol, 179, pp. 279-288, (2012)
[4]  
McLachlan A.D., Stewart M., The 14-fold periodicity in alpha-tropomyosin and the interaction with actin, J Mol Biol, 103, pp. 271-298, (1976)
[5]  
Coward E., Drablos F., Detecting periodic patterns in biological sequences, Bioinformatics, 14, pp. 498-507, (1998)
[6]  
Gruber M., Soding J., Lupas A.N., REPPER-repeats and their periodicities in fibrous proteins, Nucleic Acids Res, 33, WEB SERVER ISSUE, pp. W239-243, (2005)
[7]  
Marsella L., Sirocco F., Trovato A., Seno F., Tosatto S.C.E., REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform, Bioinformatics, 25, pp. i289-295, (2009)
[8]  
Newman A.M., Cooper J.B., XSTREAM: a practical algorithm for identification and architecture modeling of tandem repeats in protein sequences., BMC Bioinformatics, 8, (2007)
[9]  
Jorda J., Kajava A.V., T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm, Bioinformatics, 25, pp. 2632-2638, (2009)
[10]  
Pellegrini M., Marcotte E.M., Yeates T.O., A fast algorithm for genome-wide analysis of proteins with repeated sequences, Proteins, 35, pp. 440-446, (1999)