Machine learning-aided scoring of synthesis difficulties for designer chromosomes

被引:6
作者
Zheng, Yan [1 ,2 ,3 ]
Song, Kai [1 ,2 ,3 ]
Xie, Ze-Xiong [1 ,2 ,3 ]
Han, Ming-Zhe [1 ,2 ,3 ]
Guo, Fei [1 ,2 ,4 ]
Yuan, Ying-Jin [1 ,2 ,3 ]
机构
[1] Tianjin Univ, Frontiers Sci Ctr Synthet Biol, Minist Educ, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Key Lab Syst Bioengn, Minist Educ, Tianjin 300072, Peoples R China
[3] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300072, Peoples R China
[4] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
基金
中国国家自然科学基金;
关键词
synthetic biology; machine learning; artificial chromosome; chemical synthesis; CHEMICAL-SYNTHESIS; GENOME; SEQUENCE; STRANDS;
D O I
10.1007/s11427-023-2306-x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Designer chromosomes are artificially synthesized chromosomes. Nowadays, these chromosomes have numerous applications ranging from medical research to the development of biofuels. However, some chromosome fragments can interfere with the chemical synthesis of designer chromosomes and eventually limit the widespread use of this technology. To address this issue, this study aimed to develop an interpretable machine learning framework to predict and quantify the synthesis difficulties of designer chromosomes in advance. Through the use of this framework, six key sequence features leading to synthesis difficulties were identified, and an eXtreme Gradient Boosting model was established to integrate these features. The predictive model achieved high-quality performance with an AUC of 0.895 in cross-validation and an AUC of 0.885 on an independent test set. Based on these results, the synthesis difficulty index (S-index) was proposed as a means of scoring and interpreting synthesis difficulties of chromosomes from prokaryotes to eukaryotes. The findings of this study emphasize the significant variability in synthesis difficulties between chromosomes and demonstrate the potential of the proposed model to predict and mitigate these difficulties through the optimization of the synthesis process and genome rewriting.
引用
收藏
页码:1615 / 1625
页数:11
相关论文
共 56 条
[1]   Machine learning classification can reduce false positives in structure-based virtual screening [J].
Adeshina, Yusuf O. ;
Deeds, Eric J. ;
Karanicolas, John .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (31) :18477-18488
[2]   Total Synthesis of a Functional Designer Eukaryotic Chromosome [J].
Annaluru, Narayana ;
Muller, Heloise ;
Mitchell, Leslie A. ;
Ramalingam, Sivaprakash ;
Stracquadanio, Giovanni ;
Richardson, Sarah M. ;
Dymond, Jessica S. ;
Kuang, Zheng ;
Scheifele, Lisa Z. ;
Cooper, Eric M. ;
Cai, Yizhi ;
Zeller, Karen ;
Agmon, Neta ;
Han, Jeffrey S. ;
Hadjithomas, Michalis ;
Tullman, Jennifer ;
Caravelli, Katrina ;
Cirelli, Kimberly ;
Guo, Zheyuan ;
London, Viktoriya ;
Yeluru, Apurva ;
Murugan, Sindurathy ;
Kandavelou, Karthikeyan ;
Agier, Nicolas ;
Fischer, Gilles ;
Yang, Kun ;
Martin, J. Andrew ;
Bilgel, Murat ;
Bohutski, Pavlo ;
Boulier, Kristin M. ;
Capaldo, Brian J. ;
Chang, Joy ;
Charoen, Kristie ;
Choi, Woo Jin ;
Deng, Peter ;
DiCarlo, James E. ;
Doong, Judy ;
Dunn, Jessilyn ;
Feinberg, Jason I. ;
Fernandez, Christopher ;
Floria, Charlotte E. ;
Gladowski, David ;
Hadidi, Pasha ;
Ishizuka, Isabel ;
Jabbari, Javaneh ;
Lau, Calvin Y. L. ;
Lee, Pablo A. ;
Li, Sean ;
Lin, Denise ;
Linder, Matthias E. .
SCIENCE, 2014, 344 (6179) :55-58
[3]   Establishment of genomic library technology mediated by non-homologous end joining mechanism in Yarrowia lipolytica [J].
Bai, Qiuyan ;
Cheng, Shuai ;
Zhang, Jinlai ;
Li, Mengxu ;
Cao, Yingxiu ;
Yuan, Yingjin .
SCIENCE CHINA-LIFE SCIENCES, 2021, 64 (12) :2114-2128
[4]   THE NEXT STEP FOR THE SYNTHETIC GENOME [J].
Baker, Monya .
NATURE, 2011, 473 (7347) :403-408
[5]   The Genome Project-Write [J].
Boeke, Jef D. ;
Church, George ;
Hessel, Andrew ;
Kelley, Nancy J. ;
Arkin, Adam ;
Cai, Yizhi ;
Carlson, Rob ;
Chakravarti, Aravinda ;
Cornish, Virginia W. ;
Holt, Liam ;
Isaacs, Farren J. ;
Kuiken, Todd ;
Lajoi, Marc ;
Lessor, Tracy ;
Lunshof, Jeantine ;
Maurano, Matthew T. ;
Mitchell, Leslie A. ;
Rine, Jasper ;
Rosser, Susan ;
Sanjana, Neville E. ;
Silver, Pamela A. ;
Valle, David ;
Wang, Harris ;
Way, Jeffrey C. ;
Yang, Luhan .
SCIENCE, 2016, 353 (6295) :126-127
[6]   PREDICTING DNA DUPLEX STABILITY FROM THE BASE SEQUENCE [J].
BRESLAUER, KJ ;
FRANK, R ;
BLOCKER, H ;
MARKY, LA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1986, 83 (11) :3746-3750
[7]   Genome sequence of the nematode C-elegans:: A platform for investigating biology [J].
不详 .
SCIENCE, 1998, 282 (5396) :2012-2018
[8]   Chemical synthesis of poliovirus cDNA: Generation of infectious virus in the absence of natural template [J].
Cello, J ;
Paul, AV ;
Wimmer, E .
SCIENCE, 2002, 297 (5583) :1016-1018
[9]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[10]   An artificial chromosome for data storage [J].
Chen, Weigang ;
Han, Mingzhe ;
Zhou, Jianting ;
Ge, Qi ;
Wang, Panpan ;
Zhang, Xinchen ;
Zhu, Siyu ;
Song, Lifu ;
Yuan, Yingjin .
NATIONAL SCIENCE REVIEW, 2021, 8 (05)