Prediction of Protein Solubility in E. coli

被引:0
|
作者
Samak, Taghrid [1 ]
Gunter, Dan [1 ]
Wang, Zhong [1 ]
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
关键词
SEQUENCE-BASED PREDICTION; MACHINE-BASED METHOD; OVEREXPRESSION; PROPENSITY;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Gene synthesis is a key step to convert digitally predicted proteins to functional proteins. However, it is a relatively expensive and labor-intensive process. About 30-50% of the synthesized proteins are not soluble, thereby further reduces the efficacy of gene synthesis as a method for protein function characterization. Solubility prediction from primary protein sequences holds the promise to dramatically reduce the cost of gene synthesis. This work presents a framework that creates models of solubility from sequence information. From the primary protein sequences of the genes to be synthesized, sequence features can be used to build computational models for solubility. This way, biologists can focus the effort on synthesizing genes that are highly likely to generate soluble proteins. We have developed a framework that employs several machine learning algorithms to model protein solubility. The framework is used to predict protein solubility in the Escherichia coli expression system. The analysis is performed on over 1,600 quantified proteins. The approach successfully predicted the solubility with more than 80% accuracy, and enabled in depth analysis of the most important features affecting solubility. The analysis pipeline is general and can be applied to any set of sequence features to predict any binary measure. The framework also provides the biologist with a comprehensive comparison between different learning algorithms, and insightful feature analysis.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli
    Pellizza, Leonardo
    Smal, Clara
    Rodrigo, Guido
    Aran, Martin
    SCIENTIFIC REPORTS, 2018, 8
  • [2] Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli
    Leonardo Pellizza
    Clara Smal
    Guido Rodrigo
    Martín Arán
    Scientific Reports, 8
  • [3] A Relationship between mRNA Expression Levels and Protein Solubility in E. coli
    Tartaglia, Gian Gaetano
    Pechmann, Sebastian
    Dobson, Christopher M.
    Vendruscolo, Michele
    JOURNAL OF MOLECULAR BIOLOGY, 2009, 388 (02) : 381 - 389
  • [4] Alteration of lysine and arginine content as a strategy to modify protein solubility: a test for E. coli proteins
    Carballo-Amador, M. Alejandro
    Warwicker, Jim
    Dickson, Alan J.
    PROTEIN SCIENCE, 2015, 24 : 118 - 118
  • [5] Machine learning modeling for solubility prediction of recombinant antibody fragment in four different E. coli strains
    Hashemi, Atieh
    Basafa, Majid
    Behravan, Aidin
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [6] Machine learning modeling for solubility prediction of recombinant antibody fragment in four different E. coli strains
    Atieh Hashemi
    Majid Basafa
    Aidin Behravan
    Scientific Reports, 12
  • [7] Coupled selection of protein solubility in E. coli using uroporphyrinogen III methyltransferase as red fluorescent reporter
    Wang, Zhenzhen
    Yan, Hanwei
    Li, Si
    Zhang, Kuanliang
    Cheng, Beijiu
    Fan, Jun
    JOURNAL OF BIOTECHNOLOGY, 2014, 186 : 169 - 174
  • [8] Improvement of solubility and yield of recombinant protein expression in E. coli using a two-step system
    Sadeghian-Rizi, Tahereh
    Ebrahimi, Azade
    Moazzen, Fatemeh
    Yousefian, Hesam
    Jahanian-Najafabadi, Ali
    RESEARCH IN PHARMACEUTICAL SCIENCES, 2019, 14 (05) : 400 - 407
  • [9] Functional analysis of E. coli YhdL protein
    Chadani, Yuuhei
    Ono, Katsuhiko
    Abo, Tatsuhiko
    GENES & GENETIC SYSTEMS, 2008, 83 (06) : 491 - 491
  • [10] Approaches to assigning protein function in E. coli
    Armstrong, Richard N.
    FASEB JOURNAL, 2009, 23