Robust data storage in DNA by de Bruijn graph-based de novo strand assembly

被引:51
作者
Song, Lifu [1 ,2 ,3 ]
Geng, Feng [4 ]
Gong, Zi-Yi [1 ,2 ,3 ]
Chen, Xin [5 ]
Tang, Jijun [6 ,7 ]
Gong, Chunye [8 ]
Zhou, Libang [9 ]
Xia, Rui [8 ]
Han, Ming-Zhe [1 ,2 ,3 ]
Xu, Jing-Yi [1 ,2 ,3 ]
Li, Bing-Zhi [1 ,2 ,3 ]
Yuan, Ying-Jin [1 ,2 ,3 ]
机构
[1] Tianjin Univ, Frontiers Sci Ctr Synthet Biol, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Key Lab Syst Bioengn, Minist Educ, Tianjin 300072, Peoples R China
[3] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300072, Peoples R China
[4] Binzhou Med Univ, Coll Pharm, Yantai 264003, Shandong, Peoples R China
[5] Tianjin Univ, Centor Appl Math, Tianjin 300072, Peoples R China
[6] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China
[7] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[8] Natl SuperComp Ctr Tianjin, Tianjin 300457, Peoples R China
[9] Nanjing Agr Univ, Coliege Food Sci & Technol, Nanjing 210095, Jiangsu, Peoples R China
关键词
MULTIPLE SEQUENCE ALIGNMENT; DIGITAL INFORMATION; SYNTHETIC DNA; ERROR RATES; RECONSTRUCTION;
D O I
10.1038/s41467-022-33046-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. Here the authors present a strand assembly algorithm (DBGPS) using de Bruijn graph and greedy path search. DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 degrees C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.
引用
收藏
页数:9
相关论文
共 60 条
[21]   Towards practical, high-capacity, low-maintenance information storage in synthesized DNA [J].
Goldman, Nick ;
Bertone, Paul ;
Chen, Siyuan ;
Dessimoz, Christophe ;
LeProust, Emily M. ;
Sipos, Botond ;
Birney, Ewan .
NATURE, 2013, 494 (7435) :77-80
[22]   Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes [J].
Grass, Robert N. ;
Heckel, Reinhard ;
Puddu, Michela ;
Paunescu, Daniela ;
Stark, Wendelin J. .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2015, 54 (08) :2552-2555
[23]   A mixed culture of bacterial cells enables an economic DNA storage on a large scale [J].
Hao, Min ;
Qiao, Hongyan ;
Gao, Yanmin ;
Wang, Zhaoguan ;
Qiao, Xin ;
Chen, Xin ;
Qi, Hao .
COMMUNICATIONS BIOLOGY, 2020, 3 (01)
[24]   A Characterization of the DNA Data Storage Channel [J].
Heckel, Reinhard ;
Mikutis, Gediminas ;
Grass, Robert N. .
SCIENTIFIC REPORTS, 2019, 9 (1)
[25]   MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform [J].
Katoh, K ;
Misawa, K ;
Kuma, K ;
Miyata, T .
NUCLEIC ACIDS RESEARCH, 2002, 30 (14) :3059-3066
[26]   Codes for DNA Sequence Profiles [J].
Kiah, Han Mao ;
Puleo, Gregory J. ;
Milenkovic, Olgica .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2016, 62 (06) :3125-3146
[27]   A DNA-of-things storage architecture to create materials with embedded memory [J].
Koch, Julian ;
Gantenbein, Silvan ;
Masania, Kunal ;
Stark, Wendelin J. ;
Erlich, Yaniv ;
Grass, Robert N. .
NATURE BIOTECHNOLOGY, 2020, 38 (01) :39-+
[28]   Stabilizing synthetic DNA for long-term data storage with earth alkaline salts [J].
Kohll, A. Xavier ;
Antkowiak, Philipp L. ;
Chen, Weida D. ;
Nguyen, Bichlien H. ;
Stark, Wendelin J. ;
Ceze, Luis ;
Strauss, Karin ;
Grass, Robert N. .
CHEMICAL COMMUNICATIONS, 2020, 56 (25) :3613-3616
[29]  
Lee HH, 2019, NAT COMMUN, V10, DOI [10.1038/s41467-019-10258-1, 10.1038/s41467-020-18681-5]
[30]   Photon-directed multiplexed enzymatic DNA synthesis for molecular digital data storage [J].
Lee, Howon ;
Wiegand, Daniel J. ;
Griswold, Kettner ;
Punthambaker, Sukanya ;
Chun, Honggu ;
Kohman, Richie E. ;
Church, George M. .
NATURE COMMUNICATIONS, 2020, 11 (01)