CRYSTALP2: sequence-based protein crystallization propensity prediction

被引:63
作者
Kurgan, Lukasz [1 ]
Razib, Ali A. [1 ]
Aghakhani, Sara [1 ]
Dick, Scott [1 ]
Mizianty, Marcin [1 ]
Jahandideh, Samad [2 ]
机构
[1] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
[2] Shiraz Univ Med Sci, Dept Med Phys, Shiraz, Iran
基金
加拿大自然科学与工程研究理事会;
关键词
STRUCTURAL GENOMICS; ISOELECTRIC POINT; WEB SERVER; THROUGHPUT; IMPACT; BIOINFORMATICS; CORRELATE; STRATEGY; PROGRESS; DESIGN;
D O I
10.1186/1472-6807-9-50
中图分类号
Q6 [生物物理学];
学科分类号
071011 ;
摘要
Background: Current protocols yield crystals for <30% of known proteins, indicating that automatically identifying crystallizable proteins may improve high-throughput structural genomics efforts. We introduce CRYSTALP2, a kernel-based method that predicts the propensity of a given protein sequence to produce diffraction-quality crystals. This method utilizes the composition and collocation of amino acids, isoelectric point, and hydrophobicity, as estimated from the primary sequence, to generate predictions. CRYSTALP2 extends its predecessor, CRYSTALP, by enabling predictions for sequences of unrestricted size and provides improved prediction quality. Results: A significant majority of the collocations used by CRYSTALP2 include residues with high conformational entropy, or low entropy and high potential to mediate crystal contacts; notably, such residues are utilized by surface entropy reduction methods. We show that the collocations provide complementary information to the hydrophobicity and isoelectric point. Tests on four datasets show that CRYSTALP2 outperforms several existing sequence-based predictors (CRYSTALP, OB-score, and SECRET). CRYSTALP2's accuracy, MCC, and AROC range between 69.3 and 77.5%, 0.39 and 0.55, and 0.72 and 0.79, respectively. Our predictions are similar in quality and are complementary to the predictions of the most recent ParCrys and XtalPred methods. Our results also suggest that, as work in protein crystallization continues (thereby enlarging the population of proteins with known crystallization propensities), the prediction quality of the CRYSTALP2 method should increase. The prediction model and the datasets used in this contribution can be downloaded from http://biomine.ece.ualberta.ca/CRYSTALP2/CRYSTALP2.html. Conclusion: CRYSTALP2 provides relatively accurate crystallization propensity predictions for a given protein chain that either outperform or complement the existing approaches. The proposed method can be used to support current efforts towards improving the success rate in obtaining diffraction-quality crystals.
引用
收藏
页数:14
相关论文
共 48 条
[1]  
[Anonymous], Data Mining Practical Machine Learning Tools and Techniques with Java
[2]   A STRATEGY FOR THE RAPID MULTIPLE ALIGNMENT OF PROTEIN SEQUENCES - CONFIDENCE LEVELS FROM TERTIARY STRUCTURE COMPARISONS [J].
BARTON, GJ ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (02) :327-337
[3]   Practical implementations for improving the throughput in a manual crystallization setup [J].
Biertümpfel, C ;
Basquin, J ;
Suck, D .
JOURNAL OF APPLIED CRYSTALLOGRAPHY, 2005, 38 :568-570
[4]   REFERENCE POINTS FOR COMPARISONS OF 2-DIMENSIONAL MAPS OF PROTEINS FROM DIFFERENT HUMAN CELL-TYPES DEFINED IN A PH SCALE WHERE ISOELECTRIC POINTS CORRELATE WITH POLYPEPTIDE COMPOSITIONS [J].
BJELLQVIST, B ;
BASSE, B ;
OLSEN, E ;
CELIS, JE .
ELECTROPHORESIS, 1994, 15 (3-4) :529-539
[5]   Target selection for structural genomics [J].
Brenner, SE .
NATURE STRUCTURAL BIOLOGY, 2000, 7 (Suppl 11) :967-969
[6]   Normalized Gaussian radial basis function networks [J].
Bugmann, G .
NEUROCOMPUTING, 1998, 20 (1-3) :97-110
[7]  
Campbell K., 2008, OPEN BIOINFORM J, V2, P37, DOI [10.2174/1875036200802010037, DOI 10.2174/1875036200802010037]
[8]   Protein biophysical properties that correlate with crystallization success in Thermotoga maritima:: Maximum clustering strategy for structural genomics [J].
Canaves, JM ;
Page, R ;
Wilson, IA ;
Stevens, RC .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 344 (04) :977-991
[9]   The impact of structural genomics: Expectations and outcomes [J].
Chandonia, JM ;
Brenner, SE .
SCIENCE, 2006, 311 (5759) :347-351
[10]   Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches [J].
Chandonia, JM ;
Brenner, SE .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 58 (01) :166-179