Prediction of Protein Solubility in E. coli

被引:0
|
作者
Samak, Taghrid [1 ]
Gunter, Dan [1 ]
Wang, Zhong [1 ]
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
关键词
SEQUENCE-BASED PREDICTION; MACHINE-BASED METHOD; OVEREXPRESSION; PROPENSITY;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Gene synthesis is a key step to convert digitally predicted proteins to functional proteins. However, it is a relatively expensive and labor-intensive process. About 30-50% of the synthesized proteins are not soluble, thereby further reduces the efficacy of gene synthesis as a method for protein function characterization. Solubility prediction from primary protein sequences holds the promise to dramatically reduce the cost of gene synthesis. This work presents a framework that creates models of solubility from sequence information. From the primary protein sequences of the genes to be synthesized, sequence features can be used to build computational models for solubility. This way, biologists can focus the effort on synthesizing genes that are highly likely to generate soluble proteins. We have developed a framework that employs several machine learning algorithms to model protein solubility. The framework is used to predict protein solubility in the Escherichia coli expression system. The analysis is performed on over 1,600 quantified proteins. The approach successfully predicted the solubility with more than 80% accuracy, and enabled in depth analysis of the most important features affecting solubility. The analysis pipeline is general and can be applied to any set of sequence features to predict any binary measure. The framework also provides the biologist with a comprehensive comparison between different learning algorithms, and insightful feature analysis.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Structure of the E. coli DNA repair protein RarA
    Yu, Hongyu
    Cox, Michael
    Grant, Timothy
    Ducos, Peter
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2023, 299 (03) : S582 - S582
  • [22] EcoliOverExpressionDB: A database of recombinant protein overexpression in E. coli
    Habibi, Narjeskhatoon
    Samian, Mohd Razip
    Hashim, Siti Zaiton Mohd
    Norouzi, Alireza
    PROTEIN EXPRESSION AND PURIFICATION, 2014, 95 : 92 - 95
  • [23] Physiological Response to Membrane Protein Overexpression in E. coli
    Gubellini, Francesca
    Verdon, Gregory
    Karpowich, Nathan K.
    Luff, Jon D.
    Boel, Gregory
    Gauthier, Nils
    Handelman, Samuel K.
    Ades, Sarah E.
    Hunt, John F.
    MOLECULAR & CELLULAR PROTEOMICS, 2011, 10 (10)
  • [24] The E. coli Effector Protein NleF Is a Caspase Inhibitor
    Blasche, Sonja
    Moertl, Mario
    Steuber, Holger
    Siszler, Gabriella
    Nisa, Shahista
    Schwarz, Frank
    Lavrik, Inna
    Gronewold, Thomas M. A.
    Maskos, Klaus
    Donnenberg, Michael S.
    Ullmann, Dirk
    Uetz, Peter
    Koegl, Manfred
    PLOS ONE, 2013, 8 (03):
  • [25] Crystal structure and nanoanatomy of the E. coli RecA protein
    Roca, Alberto I.
    Larson, Steven
    McPherson, Alexander
    FASEB JOURNAL, 2008, 22
  • [26] Protein homeostasis sculpts fitness landscape of E. coli
    Bershtein, Shimon
    Mu, Wanmeng
    Shakhnovich, Eugene
    PROTEIN SCIENCE, 2012, 21 : 173 - 173
  • [27] Characterization of the E. coli SOS response protein YbfE
    Kramer, Caitlin
    Muenter, Mark
    Leifer, Becky
    Travers, Meghan
    Judge, Richard
    Beuning, Penny
    FASEB JOURNAL, 2016, 30
  • [28] PROTEIN MOIETIES OF CHLORAMPHENICOL PARTICLES OF E. COLI B
    FENSKE, H
    VENKER, P
    LINDIGKEIT, R
    ACTA BIOLOGICA ET MEDICA GERMANICA, 1968, 20 (03) : K5 - +
  • [29] Isobaric tags for relative protein quantitation in E. coli
    Aggarwal, K
    Choe, LH
    Franck, Z
    Lee, KH
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2005, 229 : U245 - U245
  • [30] Prediction of Protein Solubility in Escherichia coli Using Logistic Regression
    Diaz, Armando A.
    Tomba, Emanuele
    Lennarson, Reese
    Richard, Rex
    Bagajewicz, Miguel J.
    Harrison, Roger G.
    BIOTECHNOLOGY AND BIOENGINEERING, 2010, 105 (02) : 374 - 383