Molecular-level similarity search brings computing to DNA data storage

被引:58
作者
Bee, Callista [1 ]
Chen, Yuan-Jyue [2 ]
Queen, Melissa [1 ]
Ward, David [1 ]
Liu, Xiaomeng [1 ]
Organick, Lee [1 ]
Seelig, Georg [1 ,3 ]
Strauss, Karin [2 ]
Ceze, Luis [1 ]
机构
[1] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
[2] Microsoft Res, Redmond, WA 98052 USA
[3] Univ Washington, Dept Elect & Comp Engn, Seattle, WA 98195 USA
关键词
DIGITAL INFORMATION; CIRCUIT; ROBUST;
D O I
10.1038/s41467-021-24991-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search. Storage technology based on DNA is emerging as an information dense and durable medium. Here the authors use machine learning-based encoding and hybridization probes to execute similarity searches in a DNA database.
引用
收藏
页数:9
相关论文
共 37 条
[1]   Computing with DNA [J].
Adleman, LM .
SCIENTIFIC AMERICAN, 1998, 279 (02) :54-61
[2]  
Aumueller M., 2019, ANN BENCHMARKS
[3]   BUILDING AN ASSOCIATIVE MEMORY VASTLY LARGER THAN THE BRAIN [J].
BAUM, EB .
SCIENCE, 1995, 268 (5210) :583-585
[4]  
Bee C., MOL LEVEL SIMILARITY, DOI [10.5281/zenodo.5090717, DOI 10.5281/ZENODO.5090717]
[5]   An autonomous molecular computer for logical control of gene expression [J].
Benenson, Y ;
Gil, B ;
Ben-Dor, U ;
Adar, R ;
Shapiro, E .
NATURE, 2004, 429 (6990) :423-429
[6]  
Bernhardsson E., 2017, "Annoy: Approximate nearest neighbors in C++/pythonoptimized for memory usage and loading/saving to disk
[7]   Molecular digital data storage using DNA [J].
Ceze, Luis ;
Nivala, Jeff ;
Strauss, Karin .
NATURE REVIEWS GENETICS, 2019, 20 (08) :456-466
[8]   Next-Generation Digital Information Storage in DNA [J].
Church, George M. ;
Gao, Yuan ;
Kosuri, Sriram .
SCIENCE, 2012, 337 (6102) :1628-1628
[9]   DNA Fountain enables a robust and efficient storage architecture [J].
Erlich, Yaniv ;
Zielinski, Dina .
SCIENCE, 2017, 355 (6328) :950-953
[10]   Towards practical, high-capacity, low-maintenance information storage in synthesized DNA [J].
Goldman, Nick ;
Bertone, Paul ;
Chen, Siyuan ;
Dessimoz, Christophe ;
LeProust, Emily M. ;
Sipos, Botond ;
Birney, Ewan .
NATURE, 2013, 494 (7435) :77-80