SoluProt: prediction of soluble protein expression in Escherichia coli

被引:122
作者
Hon, Jiri [1 ,2 ,3 ,4 ]
Marusiak, Martin [4 ]
Martinek, Tomas [4 ]
Kunka, Antonin [1 ,2 ,3 ]
Zendulka, Jaroslav [4 ]
Bednar, David [1 ,2 ,3 ]
Damborsky, Jiri [1 ,2 ,3 ]
机构
[1] Masaryk Univ, Fac Sci, Loschmidt Labs, Ctr Tox Cpds Environm RECETOX, Brno 62500, Czech Republic
[2] Masaryk Univ, Fac Sci, Dept Expt Biol, Brno 62500, Czech Republic
[3] St Annes Univ Hosp Brno, Int Clin Res Ctr, Brno 65691, Czech Republic
[4] Brno Univ Technol, Fac Informat Technol, IT4Innovat Ctr Excellence, Brno 61266, Czech Republic
关键词
SOLUBILITY; WEBSERVER; ACCURATE; TOPOLOGY;
D O I
10.1093/bioinformatics/btaa1102
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritization of highly soluble proteins. Results: A new tool for sequence-based prediction of soluble protein expression in E.coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt's accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies.
引用
收藏
页码:23 / 28
页数:6
相关论文
共 43 条
[1]   ccSOL omics: a webserver for solubility prediction of endogenous and heterologous expression in Escherichia coli [J].
Agostini, Federico ;
Cirillo, Davide ;
Maria Livi, Carmen ;
Delli Ponti, Riccardo ;
Gaetano Tartaglia, Gian .
BIOINFORMATICS, 2014, 30 (20) :2975-2977
[2]   Sequence-Based Prediction of Protein Solubility [J].
Agostini, Federico ;
Vendruscolo, Michele ;
Tartaglia, Gian Gaetano .
JOURNAL OF MOLECULAR BIOLOGY, 2012, 421 (2-3) :237-241
[3]   Surface patches on recombinant erythropoietin predict protein solubility: engineering proteins to minimise aggregation [J].
Alejandro Carballo-Amador, M. ;
McKenzie, Edward A. ;
Dickson, Alan J. ;
Warwicker, Jim .
BMC BIOTECHNOLOGY, 2019, 19 (1)
[4]  
Berman H.M, 2017, **DATA OBJECT**, DOI [10.5281/zenodo.821654, DOI 10.5281/ZENODO.821654]
[5]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[6]   Solubility-Weighted Index: fast and accurate prediction of protein solubility [J].
Bhandari, Bikash K. ;
Gardner, Paul P. ;
Lim, Chun Shen .
BIOINFORMATICS, 2020, 36 (18) :4691-4698
[7]   RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy [J].
Burley, Stephen K. ;
Berman, Helen M. ;
Bhikadiya, Charmi ;
Bi, Chunxiao ;
Chen, Li ;
Di Costanzo, Luigi ;
Christie, Cole ;
Dalenberg, Ken ;
Duarte, Jose M. ;
Dutta, Shuchismita ;
Feng, Zukang ;
Ghosh, Sutapa ;
Goodsell, David S. ;
Green, Rachel K. ;
Guranovic, Vladimir ;
Guzenko, Dmytro ;
Hudson, Brian P. ;
Kalro, Tara ;
Liang, Yuhe ;
Lowe, Robert ;
Namkoong, Harry ;
Peisach, Ezra ;
Periskova, Irina ;
Prlic, Andreas ;
Randle, Chris ;
Rose, Alexander ;
Rose, Peter ;
Sala, Raul ;
Sekharan, Monica ;
Shao, Chenghua ;
Tan, Lihua ;
Tao, Yi-Ping ;
Valasatava, Yana ;
Voigt, Maria ;
Westbrook, John ;
Woo, Jesse ;
Yang, Huanwang ;
Young, Jasmine ;
Zhuravleva, Marina ;
Zardecki, Christine .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D464-D474
[8]   Cell-free protein synthesis: Applications come of age [J].
Carlson, Erik D. ;
Gan, Rui ;
Hodgman, C. Eric ;
Jewett, Michael C. .
BIOTECHNOLOGY ADVANCES, 2012, 30 (05) :1185-1194
[9]   Soluble expression of proteins correlates with a lack of positively-charged surface [J].
Chan, Pedro ;
Curtis, Robin A. ;
Warwicker, Jim .
SCIENTIFIC REPORTS, 2013, 3
[10]   The DynaMine webserver: predicting protein dynamics from sequence [J].
Cilia, Elisa ;
Pancsa, Rita ;
Tompa, Peter ;
Lenaerts, Tom ;
Vranken, Wim F. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) :W264-W270