BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces

被引:18
|
作者
Ojewole, Adegoke A. [1 ,2 ]
Jou, Jonathan D. [1 ]
Fowler, Vance G. [3 ]
Donald, Bruce R. [1 ,4 ]
机构
[1] Duke Univ, Dept Comp Sci, D101 Levine Sci Res Ctr LSRC Res Dr, Durham, NC 27708 USA
[2] Duke Univ, Computat Biol & Bioinformat Program, Durham, NC USA
[3] Duke Univ, Med Ctr, Div Infect Dis, Durham, NC USA
[4] Duke Univ, Med Ctr, Dept Biochem, Durham, NC 27710 USA
关键词
molecular ensembles; OSPREY; predicting binding affinity; protein design; structural biology; sublinear algorithms; DEAD-END-ELIMINATION; SIDE-CHAIN; GRAMICIDIN SYNTHETASE; SEARCH ALGORITHM; COMPUTATION; RESISTANCE; FRAMEWORK; REDESIGN; SPECIFICITY; INFECTION;
D O I
10.1089/cmb.2017.0267
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Computational protein design (CPD) algorithms that compute binding affinity, Ka, search for sequences with an energetically favorable free energy of binding. Recent work shows that three principles improve the biological accuracy of CPD: ensemble-based design, continuous flexibility of backbone and side-chain conformations, and provable guarantees of accuracy with respect to the input. However, previous methods that use all three design principles are single-sequence (SS) algorithms, which are very costly: linear in the number of sequences and thus exponential in the number of simultaneously mutable residues. To address this computational challenge, we introduce BBK*, a new CPD algorithm whose key innovation is the multisequence (MS) bound: BBK* efficiently computes a single provable upper bound to approximate Ka for a combinatorial number of sequences, and avoids SS computation for all provably suboptimal sequences. Thus, to our knowledge, BBK* is the first provable, ensemble-based CPD algorithm to run in time sublinear in the number of sequences. Computational experiments on 204 protein design problems show that BBK* finds the tightest binding sequences while approximating Ka for up to 10 5 -fold fewer sequences than the previous state-of-the-art algorithms, which require exhaustive enumeration of sequences. Furthermore, for 51 protein-ligand design problems, BBK* provably approximates Ka up to 1982-fold faster than the previous state-of-the-art iMinDEE/ A / K algorithm. Therefore, BBK* not only accelerates protein designs that are possible with previous provable algorithms, but also efficiently performs designs that are too large for previous methods.
引用
收藏
页码:726 / 739
页数:14
相关论文
共 3 条