Computational protein design as an optimization problem

被引:51
作者
Allouche, David [1 ]
Andre, Isabelle [2 ,3 ,4 ]
Barbe, Sophie [2 ,3 ,4 ]
Davies, Jessica [1 ]
de Givry, Simon [1 ]
Katsirelos, George [1 ]
O'Sullivan, Barry [5 ]
Prestwich, Steve [5 ]
Schiex, Thomas [1 ]
Traore, Seydou [2 ,3 ,4 ]
机构
[1] INRA, M1AT, UR 875, F-31320 Castanet Tolosan, France
[2] Univ Toulouse, INSA, UPS, INP,LISBP, F-31077 Toulouse, France
[3] CNRS, UMR5504, F-31400 Toulouse, France
[4] INRA, UMR792, F-31400 Toulouse, France
[5] Univ Coll Cork, Insight Ctr Data Analyt, Cork, Ireland
基金
爱尔兰科学基金会;
关键词
Weighted constraint satisfaction problem; Soft constraints; Neighborhood substitutability; Constraint optimization; Graphical model; Cost function networks; Integer linear programming; Quadratic programming; Computational protein design; Bioinformatics; Maximum a posteriori inference; Maximum satisfiability; DEAD-END-ELIMINATION; SIDE-CHAIN; ALGORITHMS; SEARCH; CONSISTENCY; STRATEGIES; PREDICTION; CRITERION; SEQUENCES; REDESIGN;
D O I
10.1016/j.artint.2014.03.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proteins are chains of simple molecules called amino acids. The three-dimensional shape of a protein and its amino acid composition define its biological function. Over millions of years, living organisms have evolved a large catalog of proteins. By exploring the space of possible amino acid sequences, protein engineering aims at similarly designing tailored proteins with specific desirable properties. In Computational Protein Design (CPD), the challenge of identifying a protein that performs a given task is defined as the combinatorial optimization of a complex energy function over amino acid sequences. In this paper, we introduce the CPD problem and some of the main approaches that have been used by structural biologists to solve it, with an emphasis on the exact method embodied in the dead-end elimination/A* algorithm (DEE/A*). The CPD problem is a specific form of binary Cost Function Network (CFN, aka Weighted CSP). We show how DEE algorithms can be incorporated and suitably modified to be maintained during search, at reasonable computational cost. We then evaluate the efficiency of CFN algorithms as implemented in our solver toulbar2, on a set of real CPD instances built in collaboration with structural biologists. The CPD problem can be easily reduced to 0/1 Linear Programming, 0/1 Quadratic Programming, 0/1 Quadratic Optimization, Weighted Partial MaxSAT and Graphical Model optimization problems. We compare toulbar2 with these different approaches using a variety of solvers. We observe tremendous differences in the difficulty that each approach has on these instances. Overall, the CFN approach shows the best efficiency on these problems, improving by several orders of magnitude against the exact DEE/A* approach. The introduction of dead-end elimination before or during search allows to further improve these results. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:59 / 79
页数:21
相关论文
共 90 条
[1]   Branching rules revisited [J].
Achterberg, T ;
Koch, T ;
Martin, A .
OPERATIONS RESEARCH LETTERS, 2005, 33 (01) :42-54
[2]  
Allouche David, 2012, Principles and Practice of Constraint Programming. Proceedings 18th International Conference, CP 2012, P840, DOI 10.1007/978-3-642-33558-7_60
[3]   PRINCIPLES THAT GOVERN FOLDING OF PROTEIN CHAINS [J].
ANFINSEN, CB .
SCIENCE, 1973, 181 (4096) :223-230
[4]  
Ansótegui C, 2009, LECT NOTES COMPUT SC, V5584, P427, DOI 10.1007/978-3-642-02777-2_39
[5]   Encoding Max-CSP into Partial Max-SAT [J].
Argelich, Josep ;
Cabiscol, Alba ;
Lynce, Ines ;
Manya, Felip .
38TH INTERNATIONAL SYMPOSIUM ON MULTIPLE-VALUED LOGIC (ISMVL 2008), 2008, :106-111
[6]  
Bacchus F, 2007, LECT NOTES COMPUT SC, V4741, P133
[7]   Potential energy functions for protein design [J].
Boas, F. Edward ;
Harbury, Pehr B. .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2007, 17 (02) :199-204
[8]  
Boussemart F, 2004, FRONT ARTIF INTEL AP, V110, P146
[9]   A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].
BOWIE, JU ;
LUTHY, R ;
EISENBERG, D .
SCIENCE, 1991, 253 (5016) :164-170
[10]  
Campeotto F., 1991, SCIENCE, V253, P164