Design and structure of overlapping regions in PCA via deep learning

被引:0
作者
Zheng, Yan [1 ,2 ,3 ]
Cui, Xi-Chen [1 ,2 ,3 ]
Guo, Fei [1 ,2 ,4 ]
Dou, Ming-Liang [1 ,2 ]
Xie, Ze-Xiong [1 ,2 ,3 ]
Yuan, Ying-Jin [1 ,2 ,3 ]
机构
[1] Tianjin Univ, Frontiers Sci Ctr Synthet Biol, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Key Lab Syst Bioengn, Minist Educ, Tianjin 300072, Peoples R China
[3] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300072, Peoples R China
[4] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
基金
中国国家自然科学基金;
关键词
Synthetic biology; PCA; Deep learning; Molecular dynamics; GENE SYNTHESIS; BASE-STACKING; DNA; VISUALIZATION; GENEDESIGN; STABILITY; HYBRID; SYSTEM;
D O I
10.1016/j.synbio.2024.12.007
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Polymerase cycling assembly (PCA) stands out as the predominant method in the synthesis of kilobase-length DNA fragments. The design of overlapping regions is the core factor affecting the success rate of synthesis. However, there still exists DNA sequences that are challenging to design and construct in the genome synthesis. Here we proposed a deep learning model based on extensive synthesis data to discern latent sequence representations in overlapping regions with an AUPR of 0.805. Utilizing the model, we developed the SmartCut algorithm aimed at designing oligonucleotides and enhancing the success rate of PCA experiments. This algorithm was successfully applied to sequences with diverse synthesis constraints, 80.4 % of which were synthesized in a single round. We further discovered structure differences represented by major groove width, stagger, slide, and centroid distance between overlapping and non-overlapping regions, which elucidated the model's reasonableness through the lens of physical chemistry. This comprehensive approach facilitates streamlined and efficient investigations into the genome synthesis.
引用
收藏
页码:442 / 451
页数:10
相关论文
共 67 条
[41]   Thermomechanics of DNA: Theory of Thermal Stability under Load [J].
Nisoli, Cristiano ;
Bishop, A. R. .
PHYSICAL REVIEW LETTERS, 2011, 107 (06)
[42]   Open Babel: An open chemical toolbox [J].
O'Boyle, Noel M. ;
Banck, Michael ;
James, Craig A. ;
Morley, Chris ;
Vandermeersch, Tim ;
Hutchison, Geoffrey R. .
JOURNAL OF CHEMINFORMATICS, 2011, 3
[43]   Technological challenges and milestones for writing genomes [J].
Ostrov, Nili ;
Beal, Jacob ;
Ellis, Tom ;
Gordon, D. Benjamin ;
Karas, Bogumil J. ;
Lee, Henry H. ;
Lenaghan, Scott C. ;
Schloss, Jeffery A. ;
Stracquadanio, Giovanni ;
Trefzer, Axel ;
Bader, Joel S. ;
Church, George M. ;
Coelho, Cintia M. ;
Efcavitch, J. William ;
Guell, Marc ;
Mitchell, Leslie A. ;
Nielsen, Alec A. K. ;
Peck, Bill ;
Smith, Alexander C. ;
Stewart, C. Neal, Jr. ;
Tekotte, Hille .
SCIENCE, 2019, 366 (6463) :310-+
[44]   UCSF chimera - A visualization system for exploratory research and analysis [J].
Pettersen, EF ;
Goddard, TD ;
Huang, CC ;
Couch, GS ;
Greenblatt, DM ;
Meng, EC ;
Ferrin, TE .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2004, 25 (13) :1605-1612
[45]   Multiplexed gene synthesis in emulsions for exploring protein functional landscapes [J].
Plesa, Calin ;
Sidore, Angus M. ;
Lubock, Nathan B. ;
Zhang, Di ;
Kosuri, Sriram .
SCIENCE, 2018, 359 (6373) :343-347
[46]   A modified TIP3P water potential for simulation with Ewald summation [J].
Price, DJ ;
Brooks, CL .
JOURNAL OF CHEMICAL PHYSICS, 2004, 121 (20) :10096-10103
[47]   GeneDesign 3.0 is an updated synthetic biology toolkit [J].
Richardson, Sarah M. ;
Nunley, Paul W. ;
Yarrington, Robert M. ;
Boeke, Jef D. ;
Bader, Joel S. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (08) :2603-2606
[48]  
Richardson SM, 2006, GENOME RES, V16, P550, DOI 10.1101/gr.4431306
[49]  
Samantha P, 2022, PLoS Comput Biol, V18
[50]   DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions [J].
Sidore, Angus M. ;
Plesa, Calin ;
Samson, Joyce A. ;
Lubock, Nathan B. ;
Kosuri, Sriram .
NUCLEIC ACIDS RESEARCH, 2020, 48 (16) :E95