Prediction of Protein Solubility in E. coli

被引：0

作者：

Samak, Taghrid ^{[1
]}

Gunter, Dan ^{[1
]}

Wang, Zhong ^{[1
]}

机构：

[1] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA

来源：

2012 IEEE 8TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE) | 2012年

关键词：

SEQUENCE-BASED PREDICTION; MACHINE-BASED METHOD; OVEREXPRESSION; PROPENSITY;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Gene synthesis is a key step to convert digitally predicted proteins to functional proteins. However, it is a relatively expensive and labor-intensive process. About 30-50% of the synthesized proteins are not soluble, thereby further reduces the efficacy of gene synthesis as a method for protein function characterization. Solubility prediction from primary protein sequences holds the promise to dramatically reduce the cost of gene synthesis. This work presents a framework that creates models of solubility from sequence information. From the primary protein sequences of the genes to be synthesized, sequence features can be used to build computational models for solubility. This way, biologists can focus the effort on synthesizing genes that are highly likely to generate soluble proteins. We have developed a framework that employs several machine learning algorithms to model protein solubility. The framework is used to predict protein solubility in the Escherichia coli expression system. The analysis is performed on over 1,600 quantified proteins. The approach successfully predicted the solubility with more than 80% accuracy, and enabled in depth analysis of the most important features affecting solubility. The analysis pipeline is general and can be applied to any set of sequence features to predict any binary measure. The framework also provides the biologist with a comprehensive comparison between different learning algorithms, and insightful feature analysis.

引用

页数：8

共 50 条

[41] E. coli exposed
不详
ENVIRONMENTAL HEALTH PERSPECTIVES, 2002, 110 (10) : A586 - A589
[42] E. coli Happens
Mermelstein, Neil H.
FOOD TECHNOLOGY, 2017, 71 (01) : 60 - 63
[43] E. coli vaccine
Kent, H
CANADIAN MEDICAL ASSOCIATION JOURNAL, 2004, 170 (05) : 781 - 781
[44] Rewiring E. coli
Allison Doerr
Nature Methods, 2008, 5 : 464 - 465
[45] E. coli: the computer
William Wells
Genome Biology, 1 (1)
[46] E. coli High
Lewis, Tanya
SCIENTIFIC AMERICAN, 2020, 322 (01) : 16 - 17
[47] Probing E. coli
William Wells
Genome Biology, 1 (1)
[48] E. coli on the March
Gorman, Christine
SCIENTIFIC AMERICAN, 2011, 305 (02) : 26 - 26
[49] Deciphering E. coli
Black, H
SCIENTIST, 2001, 15 (04): : 20 - 20
[50] Over expression of Narcissus potyvirus coat protein in E. coli
Yadav, N.
Khan, J. A.
INDIAN JOURNAL OF VIROLOGY, 2009, 20 (01): : 44 - 44

← 1 2 3 4 5 →