Scalable and robust DNA-based storage via coding theory and deep learning

被引:2
作者
Bar-Lev, Daniella [1 ]
Orr, Itai [1 ,2 ]
Sabary, Omer [1 ]
Etzion, Tuvi [1 ]
Yaakobi, Eitan [1 ]
机构
[1] Technion Israel Inst Technol, Comp Sci Fac, Haifa, Israel
[2] UVeye Ltd, Tel Aviv, Israel
基金
欧洲研究理事会;
关键词
DIGITAL INFORMATION;
D O I
10.1038/s42256-025-01003-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The global data sphere is expanding exponentially, projected to hit 180 zettabytes by 2025, whereas current technologies are not anticipated to scale at nearly the same rate. DNA-based storage emerges as a crucial solution to this gap, enabling digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of the main bottlenecks are the scalability and accuracy, which have a natural tradeoff between the two. Here we show a modular and holistic approach that combines deep neural networks trained on simulated data, tensor product-based error-correcting codes and a safety margin mechanism into a single coherent pipeline. We demonstrated our solution on 3.1 MB of information using two different sequencing technologies. Our work improves upon the current leading solutions with a 3,200x increase in speed and a 40% improvement in accuracy and offers a code rate of 1.6 bits per base in a high-noise regime. In a broader sense, our work shows a viable path to commercial DNA storage solutions hindered by current information retrieval processes.
引用
收藏
页码:639 / 649
页数:11
相关论文
共 55 条
[1]  
Anavy L, 2019, NAT BIOTECHNOL, V37, P1237, DOI 10.1038/s41587-019-0281-1
[2]  
[Anonymous], 2021, Preserving Our Digital Legacy: an Introduction to DNA Data Storage
[3]  
Bar-Lev D., 2024, Zenodo, DOI [10.5281/zenodo.14296588, DOI 10.5281/ZENODO.14296588]
[4]  
Bar-Lev D., 2024, **DATA OBJECT**, DOI 10.5281/zenodo.14266018
[5]  
Bar-Lev D., 2024, **DATA OBJECT**, DOI 10.5281/zenodo.13896773
[6]  
Batu T., 2004, PROC ACM SIAM S DIS, P910
[7]   Molecular-level similarity search brings computing to DNA data storage [J].
Bee, Callista ;
Chen, Yuan-Jyue ;
Queen, Melissa ;
Ward, David ;
Liu, Xiaomeng ;
Organick, Lee ;
Seelig, Georg ;
Strauss, Karin ;
Ceze, Luis .
NATURE COMMUNICATIONS, 2021, 12 (01)
[8]   Forward Error Correction for DNA Data Storage [J].
Blawat, Meinolf ;
Gaedke, Klaus ;
Huetter, Ingo ;
Chen, Xiao-Ming ;
Turczyk, Brian ;
Inverso, Samuel ;
Pruitt, Benjamin W. ;
Church, George M. .
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 :1011-1022
[9]  
Bohlin J., 2019, Big Data Anal, V4, P1, DOI [10.1186/s41044-019-0042-7, DOI 10.1186/S41044-019-0042-7]
[10]  
Bornhol J, 2016, ACM SIGPLAN NOTICES, V51, P637, DOI [10.1145/2872362.2872397, 10.1145/2954679.2872397]