Optimizing nucleotide sequence ensembles for combinatorial protein libraries using a genetic algorithm

被引:5
作者
Craig, Roger A. [1 ]
Lu, Jin [2 ]
Luo, Jinquan [2 ]
Shi, Lei [2 ]
Liao, Li [1 ]
机构
[1] Univ Delaware, Dept Comp & Informat Sci, Newark, DE 19716 USA
[2] Centocor Res & Dev Inc, Radnor, PA 19087 USA
关键词
DESIGN;
D O I
10.1093/nar/gkp906
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein libraries are essential to the field of protein engineering. Increasingly, probabilistic protein design is being used to synthesize combinatorial protein libraries, which allow the protein engineer to explore a vast space of amino acid sequences, while at the same time placing restrictions on the amino acid distributions. To this end, if site-specific amino acid probabilities are input as the target, then the codon nucleotide distributions that match this target distribution can be used to generate a partially randomized gene library. However, it turns out to be a highly nontrivial computational task to find the codon nucleotide distributions that exactly matches a given target distribution of amino acids. We first showed that for any given target distribution an exact solution may not exist at all. Formulated as a constrained optimization problem, we then developed a genetic algorithm-based approach to find codon nucleotide distributions that match as closely as possible to the target amino acid distribution. As compared with the previous gradient descent method on various objective functions, the new method consistently gave more optimized distributions as measured by the relative entropy between the calculated and the target distributions. To simulate the actual lab solutions, new objective functions were designed to allow for two separate sets of codons in seeking a better match to the target amino acid distribution.
引用
收藏
页码:e10.1 / e10.9
页数:9
相关论文
共 14 条
[1]   ANTIBODY ENGINEERING BY PARSIMONIOUS MUTAGENESIS [J].
BALINT, RF ;
LARRICK, JW .
GENE, 1993, 137 (01) :109-118
[2]   Yeast surface display for screening combinatorial polypeptide libraries [J].
Boder, ET ;
Wittrup, KD .
NATURE BIOTECHNOLOGY, 1997, 15 (06) :553-557
[3]   NOMENCLATURE FOR INCOMPLETELY SPECIFIED BASES IN NUCLEIC-ACID SEQUENCES - RECOMMENDATIONS 1984 [J].
CORNISHBOWDEN, A .
NUCLEIC ACIDS RESEARCH, 1985, 13 (09) :3021-3030
[4]   Protein design and phage display [J].
Hoess, RH .
CHEMICAL REVIEWS, 2001, 101 (10) :3205-3218
[5]   Scoring functions for computational algorithms applicable to the design of spiked oligonucleotides [J].
Jensen, LJ ;
Andersen, KV ;
Svendsen, A ;
Kretzschmar, T .
NUCLEIC ACIDS RESEARCH, 1998, 26 (03) :697-702
[6]   PROTEIN DESIGN BY BINARY PATTERNING OF POLAR AND NONPOLAR AMINO-ACIDS [J].
KAMTEKAR, S ;
SCHIFFER, JM ;
XIONG, HY ;
BABIK, JM ;
HECHT, MH .
SCIENCE, 1993, 262 (5140) :1680-1685
[7]   Statistical theory for protein combinatorial libraries. Packing interactions, backbone flexibility, and the sequence variability of a main-chain structure [J].
Kono, H ;
Saven, JG .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 306 (03) :607-628
[8]   DESIGN OF SYNTHETIC GENE LIBRARIES ENCODING RANDOM SEQUENCE PROTEINS WITH DESIRED ENSEMBLE CHARACTERISTICS [J].
LABEAN, TH ;
KAUFFMAN, SA .
PROTEIN SCIENCE, 1993, 2 (08) :1249-1254
[9]  
LEVENSHT.VI, 1965, DOKL AKAD NAUK SSSR+, V163, P845
[10]   Automated design of degenerate codon libraries [J].
Mena, MA ;
Daugherty, PS .
PROTEIN ENGINEERING DESIGN & SELECTION, 2005, 18 (12) :559-561