Machine learning-aided scoring of synthesis difficulties for designer chromosomes

被引:6
作者
Zheng, Yan [1 ,2 ,3 ]
Song, Kai [1 ,2 ,3 ]
Xie, Ze-Xiong [1 ,2 ,3 ]
Han, Ming-Zhe [1 ,2 ,3 ]
Guo, Fei [1 ,2 ,4 ]
Yuan, Ying-Jin [1 ,2 ,3 ]
机构
[1] Tianjin Univ, Frontiers Sci Ctr Synthet Biol, Minist Educ, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Key Lab Syst Bioengn, Minist Educ, Tianjin 300072, Peoples R China
[3] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300072, Peoples R China
[4] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
基金
中国国家自然科学基金;
关键词
synthetic biology; machine learning; artificial chromosome; chemical synthesis; CHEMICAL-SYNTHESIS; GENOME; SEQUENCE; STRANDS;
D O I
10.1007/s11427-023-2306-x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Designer chromosomes are artificially synthesized chromosomes. Nowadays, these chromosomes have numerous applications ranging from medical research to the development of biofuels. However, some chromosome fragments can interfere with the chemical synthesis of designer chromosomes and eventually limit the widespread use of this technology. To address this issue, this study aimed to develop an interpretable machine learning framework to predict and quantify the synthesis difficulties of designer chromosomes in advance. Through the use of this framework, six key sequence features leading to synthesis difficulties were identified, and an eXtreme Gradient Boosting model was established to integrate these features. The predictive model achieved high-quality performance with an AUC of 0.895 in cross-validation and an AUC of 0.885 on an independent test set. Based on these results, the synthesis difficulty index (S-index) was proposed as a means of scoring and interpreting synthesis difficulties of chromosomes from prokaryotes to eukaryotes. The findings of this study emphasize the significant variability in synthesis difficulties between chromosomes and demonstrate the potential of the proposed model to predict and mitigate these difficulties through the optimization of the synthesis process and genome rewriting.
引用
收藏
页码:1615 / 1625
页数:11
相关论文
共 56 条
[11]   HAIRPINS ARE FORMED BY THE SINGLE DNA STRANDS OF THE FRAGILE-X TRIPLET REPEATS - STRUCTURE AND BIOLOGICAL IMPLICATIONS [J].
CHEN, XA ;
MARIAPPAN, SVS ;
CATASTI, P ;
RATLIFF, R ;
MOYZIS, RK ;
LAAYOUN, A ;
SMITH, SS ;
BRADBURY, EM ;
GUPTA, G .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (11) :5199-5203
[12]   Genome Calligrapher: A Web Tool for Refactoring Bacterial Genome Sequences for de Novo DNA Synthesis [J].
Christen, Matthias ;
Deutsch, Samuel ;
Christen, Beat .
ACS SYNTHETIC BIOLOGY, 2015, 4 (08) :927-934
[13]   Modernizing Reference Genome Assemblies [J].
Church, Deanna M. ;
Schneider, Valerie A. ;
Graves, Tina ;
Auger, Katherine ;
Cunningham, Fiona ;
Bouk, Nathan ;
Chen, Hsiu-Chuan ;
Agarwala, Richa ;
McLaren, William M. ;
Ritchie, Graham R. S. ;
Albracht, Derek ;
Kremitzki, Milinn ;
Rock, Susan ;
Kotkiewicz, Holland ;
Kremitzki, Colin ;
Wollam, Aye ;
Trani, Lee ;
Fulton, Lucinda ;
Fulton, Robert ;
Matthews, Lucy ;
Whitehead, Siobhan ;
Chow, Will ;
Torrance, James ;
Dunn, Matthew ;
Harden, Glenn ;
Threadgold, Glen ;
Wood, Jonathan ;
Collins, Joanna ;
Heath, Paul ;
Griffiths, Guy ;
Pelan, Sarah ;
Grafham, Darren ;
Eichler, Evan E. ;
Weinstock, George ;
Mardis, Elaine R. ;
Wilson, Richard K. ;
Howe, Kerstin ;
Flicek, Paul ;
Hubbard, Tim .
PLOS BIOLOGY, 2011, 9 (07)
[14]   Gene synthesis demystified [J].
Czar, Michael J. ;
Anderson, J. Christopher ;
Bader, Joel S. ;
Peccoud, Jean .
TRENDS IN BIOTECHNOLOGY, 2009, 27 (02) :63-72
[15]   Effect of GGC (glycine) repeat length polymorphism in the human androgen receptor on androgen action [J].
Ding, DC ;
Xu, LH ;
Menon, M ;
Reddy, GPV ;
Barrack, ER .
PROSTATE, 2005, 62 (02) :133-139
[16]   Sequence repetitiveness quantification and de novo repeat detection by weighted k-mer coverage [J].
Feng, Cong ;
Dai, Min ;
Liu, Yongjing ;
Chen, Ming .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
[17]   A Unified Dynamic Programming Framework for the Analysis of Interacting Nucleic Acid Strands: Enhanced Models, Scalability, and Speed [J].
Fornace, Mark E. ;
Porubsky, Nicholas J. ;
Pierce, Niles A. .
ACS SYNTHETIC BIOLOGY, 2020, 9 (10) :2665-2678
[18]   Total synthesis of Escherichia coli with a recoded genome [J].
Fredens, Julius ;
Wang, Kaihang ;
de la Torre, Daniel ;
Funke, Louise F. H. ;
Robertson, Wesley E. ;
Christova, Yonka ;
Chia, Tiongsun ;
Schmied, Wolfgang H. ;
Dunkelmann, Daniel L. ;
Beranek, Vaclav ;
Uttamapinant, Chayasith ;
Llamazares, Andres Gonzalez ;
Elliott, Thomas S. ;
Chin, Jason W. .
NATURE, 2019, 569 (7757) :514-+
[19]   Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome [J].
Gibson, Daniel G. ;
Benders, Gwynedd A. ;
Andrews-Pfannkoch, Cynthia ;
Denisova, Evgeniya A. ;
Baden-Tillson, Holly ;
Zaveri, Jayshree ;
Stockwell, Timothy B. ;
Brownley, Anushka ;
Thomas, David W. ;
Algire, Mikkel A. ;
Merryman, Chuck ;
Young, Lei ;
Noskov, Vladimir N. ;
Glass, John I. ;
Venter, J. Craig ;
Hutchison, Clyde A., III ;
Smith, Hamilton O. .
SCIENCE, 2008, 319 (5867) :1215-1220
[20]   Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome [J].
Gibson, Daniel G. ;
Glass, John I. ;
Lartigue, Carole ;
Noskov, Vladimir N. ;
Chuang, Ray-Yuan ;
Algire, Mikkel A. ;
Benders, Gwynedd A. ;
Montague, Michael G. ;
Ma, Li ;
Moodie, Monzia M. ;
Merryman, Chuck ;
Vashee, Sanjay ;
Krishnakumar, Radha ;
Assad-Garcia, Nacyra ;
Andrews-Pfannkoch, Cynthia ;
Denisova, Evgeniya A. ;
Young, Lei ;
Qi, Zhi-Qing ;
Segall-Shapiro, Thomas H. ;
Calvey, Christopher H. ;
Parmar, Prashanth P. ;
Hutchison, Clyde A., III ;
Smith, Hamilton O. ;
Venter, J. Craig .
SCIENCE, 2010, 329 (5987) :52-56