Developing optimal non-linear scoring function for protein design

被引:20
作者
Hu, CY [1 ]
Li, X [1 ]
Liang, J [1 ]
机构
[1] Univ Illinois, SEO, Dept Bioengn, Chicago, IL 60607 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/bth369
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation. Protein design aims to identify sequences compatible with a given protein fold but incompatible to any alternative folds. To select the correct sequences and to guide the search process, a design scoring function is critically important. Such a scoring function should be able to characterize the global fitness landscape of many proteins simultaneously. Results: To find optimal design scoring functions, we introduce two geometric views and propose a formulation using a mixture of non-linear Gaussian kernel functions. We aim to solve a simplified protein sequence design problem. Our goal is to distinguish each native sequence for a major portion of representative protein structures from a large number of alternative decoy sequences, each a fragment from proteins of different folds. Our scoring function discriminates perfectly a set of 440 native proteins from 14 million sequence decoys. We show that no linear scoring function can succeed in this task. In a blind test of unrelated proteins, our scoring function misclassfies only 13 native proteins out of 194. This compares favorably with about three-four times more misclassifications when optimal linear functions reported in the literature are used. We also discuss how to develop protein folding scoring function.
引用
收藏
页码:3080 / 3098
页数:19
相关论文
共 83 条
[1]   How to guarantee optimal stability for most representative structures in the protein data bank [J].
Bastolla, U ;
Farwer, J ;
Knapp, EW ;
Vendruscolo, M .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 44 (02) :79-96
[2]   Statistical potentials extracted from protein structures: Are these meaningful potentials? [J].
BenNaim, A .
JOURNAL OF CHEMICAL PHYSICS, 1997, 107 (09) :3698-3706
[3]  
Betancourt MR, 1999, PROTEIN SCI, V8, P361
[4]  
BURGES CJC, 1998, TUTORIAL SUPPORT VEC, P2
[5]   Optimizing energy potentials for success in protein tertiary structure prediction [J].
Chiu, TL ;
Goldstein, RA .
FOLDING & DESIGN, 1998, 3 (03) :223-228
[6]   Folding, design, and determination of interaction potentials using off-lattice dynamics of model heteropolymers [J].
Clementi, C ;
Maritan, A ;
Banavar, JR .
PHYSICAL REVIEW LETTERS, 1998, 81 (15) :3287-3290
[7]   De novo protein design: Fully automated sequence selection [J].
Dahiyat, BI ;
Mayo, SL .
SCIENCE, 1997, 278 (5335) :82-87
[8]   De novo design and structural characterization of proteins and metalloproteins [J].
DeGrado, WF ;
Summa, CM ;
Pavone, V ;
Nastri, F ;
Lombardi, A .
ANNUAL REVIEW OF BIOCHEMISTRY, 1999, 68 :779-819
[9]  
DESJARLAIS J, 1995, PROTEIN SCI, V19, P244
[10]   New algorithm for protein design [J].
Deutsch, JM ;
Kurosky, T .
PHYSICAL REVIEW LETTERS, 1996, 76 (02) :323-326