Probing the physical limits of reliable DNA data retrieval

被引:90
作者
Organick, Lee [1 ]
Chen, Yuan-Jyue [2 ]
Ang, Siena Dumas [2 ]
Lopez, Randolph [3 ]
Liu, Xiaomeng [1 ]
Strauss, Karin [2 ]
Ceze, Luis [1 ]
机构
[1] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
[2] Microsoft, Redmond, WA 98052 USA
[3] Univ Washington, Dept Bioengn, Seattle, WA 98195 USA
关键词
DIGITAL INFORMATION; STORAGE; ROBUST;
D O I
10.1038/s41467-020-14319-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Synthetic DNA is gaining momentum as a potential storage medium for archival data storage. In this process, digital information is translated into sequences of nucleotides and the resulting synthetic DNA strands are then stored for later retrieval. Here, we demonstrate reliable file recovery with PCR-based random access when as few as ten copies per sequence are stored, on average. This results in density of about 17 exabytes/gram, nearly two orders of magnitude greater than prior work has shown. We successfully retrieve the same data in a complex pool of over 10(10) unique sequences per microliter with no evidence that we have begun to approach complexity limits. Finally, we also investigate the effects of file size and sequencing coverage on successful file retrieval and look for systematic DNA strand drop out. These findings substantiate the robustness and high data density of the process examined here. The physical limits and reliability of PCR-based random access of DNA encoded data is unknown. Here the authors demonstrate reliable file recovery from as few as ten copies per sequence, providing a data density limit of 17 exabytes per gram.
引用
收藏
页数:7
相关论文
共 19 条
[11]   Missing value estimation for DNA microarray gene expression data: local least squares imputation [J].
Kim, H ;
Golub, GH ;
Park, H .
BIOINFORMATICS, 2005, 21 (02) :187-198
[12]   Random access in large-scale DNA data storage (vol 36, pg 242, 2018) [J].
Organick, Lee ;
Ang, Siena Dumas ;
Chen, Yuan-Jyue ;
Lopez, Randolph ;
Yekhanin, Sergey ;
Makarychev, Konstantin ;
Racz, Miklos Z. ;
Kamath, Govinda ;
Gopalan, Parikshit ;
Nguyen, Bichlien ;
Takahashi, Christopher N. ;
Newman, Sharon ;
Parker, Hsing-Yeh ;
Rashtchian, Cyrus ;
Stewart, Kendall ;
Gupta, Gagan ;
Carlson, Robert ;
Mulligan, John ;
Carmean, Douglas ;
Seelig, Georg ;
Ceze, Luis ;
Strauss, Karin .
NATURE BIOTECHNOLOGY, 2018, 36 (03) :242-+
[13]  
Tabatabaei Yazdi S.M., 2015, SCI REPORTS, V5, P1
[14]   Driving the Scalability of DNA-Based Information Storage Systems [J].
Tomek, Kyle J. ;
Volkel, Kevin ;
Simpson, Alexander ;
Hass, Austin G. ;
Indermaur, Elaine W. ;
Tuck, James M. ;
Keung, Albert J. .
ACS SYNTHETIC BIOLOGY, 2019, 8 (06) :1241-1248
[15]   Sequential imputation for missing values [J].
Verboven, Sabine ;
Branden, Karlien Vanden ;
Goos, Peter .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2007, 31 (5-6) :320-327
[16]   Design of 240,000 orthogonal 25mer DNA barcode probes [J].
Xu, Qikai ;
Schlabach, Michael R. ;
Hannon, Gregory J. ;
Elledge, Stephen J. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (07) :2289-2294
[17]   Portable and Error-Free DNA-Based Data Storage [J].
Yazdi, S. M. Hossein Tabatabaei ;
Gabrys, Ryan ;
Milenkovic, Olgica .
SCIENTIFIC REPORTS, 2017, 7
[18]   The effect of high-frequency random mutagenesis on in vitro protein evolution:: A study on TEM-1 β-lactamase [J].
Zaccolo, M ;
Gherardi, E .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 285 (02) :775-783
[19]   Nucleic acid memory [J].
Zhirnov, Victor ;
Zadegan, Reza M. ;
Sandhu, Gurtej S. ;
Church, George M. ;
Hughes, William L. .
NATURE MATERIALS, 2016, 15 (04) :366-370