Robust data storage in DNA by de Bruijn graph-based de novo strand assembly

被引:51
作者
Song, Lifu [1 ,2 ,3 ]
Geng, Feng [4 ]
Gong, Zi-Yi [1 ,2 ,3 ]
Chen, Xin [5 ]
Tang, Jijun [6 ,7 ]
Gong, Chunye [8 ]
Zhou, Libang [9 ]
Xia, Rui [8 ]
Han, Ming-Zhe [1 ,2 ,3 ]
Xu, Jing-Yi [1 ,2 ,3 ]
Li, Bing-Zhi [1 ,2 ,3 ]
Yuan, Ying-Jin [1 ,2 ,3 ]
机构
[1] Tianjin Univ, Frontiers Sci Ctr Synthet Biol, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Key Lab Syst Bioengn, Minist Educ, Tianjin 300072, Peoples R China
[3] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300072, Peoples R China
[4] Binzhou Med Univ, Coll Pharm, Yantai 264003, Shandong, Peoples R China
[5] Tianjin Univ, Centor Appl Math, Tianjin 300072, Peoples R China
[6] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China
[7] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[8] Natl SuperComp Ctr Tianjin, Tianjin 300457, Peoples R China
[9] Nanjing Agr Univ, Coliege Food Sci & Technol, Nanjing 210095, Jiangsu, Peoples R China
关键词
MULTIPLE SEQUENCE ALIGNMENT; DIGITAL INFORMATION; SYNTHETIC DNA; ERROR RATES; RECONSTRUCTION;
D O I
10.1038/s41467-022-33046-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. Here the authors present a strand assembly algorithm (DBGPS) using de Bruijn graph and greedy path search. DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 degrees C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.
引用
收藏
页数:9
相关论文
共 60 条
[1]   Data storage in DNA with fewer synthesis cycles using composite DNA letters [J].
Anavy, Leon ;
Vaknin, Inbal ;
Atar, Orna ;
Amit, Roee ;
Yakhini, Zohar .
NATURE BIOTECHNOLOGY, 2019, 37 (10) :1229-+
[2]   Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction [J].
Antkowiak, Philipp L. ;
Lietard, Jory ;
Darestani, Mohammad Zalbagi ;
Somoza, Mark M. ;
Stark, Wendelin J. ;
Heckel, Reinhard ;
Grass, Robert N. .
NATURE COMMUNICATIONS, 2020, 11 (01)
[3]   Random access DNA memory using Boolean search in an archival file storage system [J].
Banal, James L. ;
Shepherd, Tyson R. ;
Berleant, Joseph ;
Huang, Hellen ;
Reyes, Miguel ;
Ackerman, Cheri M. ;
Blainey, Paul C. ;
Bathe, Mark .
NATURE MATERIALS, 2021, 20 (09) :1272-+
[4]  
Bancroft C, 2001, SCIENCE, V293, P1763
[5]   Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads [J].
Bankevich, Anton ;
Bzikadze, Andrey V. ;
Kolmogorov, Mikhail ;
Antipov, Dmitry ;
Pevzner, Pavel A. .
NATURE BIOTECHNOLOGY, 2022, 40 (07) :1075-+
[6]  
Batu T., 2004, P ANN ACM SIAM S DIS, V15, DOI [10.1145/982792.982929, DOI 10.1145/982792.982929]
[7]  
Bhardwaj V, 2021, IEEE T INFORM THEORY, V67, P3295, DOI [10.1109/TIT.2020.3030569, 10.1109/tit.2020.3030569]
[8]   Molecular digital data storage using DNA [J].
Ceze, Luis ;
Nivala, Jeff ;
Strauss, Karin .
NATURE REVIEWS GENETICS, 2019, 20 (08) :456-466
[9]  
Chandak S, 2020, INT CONF ACOUST SPEE, P8822, DOI [10.1109/ICASSP40776.2020.9053441, 10.1109/icassp40776.2020.9053441]
[10]   Combining Data Longevity with High Storage Capacity-Layer-by-Layer DNA Encapsulated in Magnetic Nanoparticles [J].
Chen, Weida D. ;
Kohll, A. Xavier ;
Nguyen, Bichlien H. ;
Koch, Julian ;
Heckel, Reinhard ;
Stark, Wendelin J. ;
Ceze, Luis ;
Strauss, Karin ;
Grass, Robert N. .
ADVANCED FUNCTIONAL MATERIALS, 2019, 29 (28)