Protecting the Future of Information: LOCO Coding With Error Detection for DNA Data Storage

被引:1
作者
Irimagzi, Canberk [1 ]
Uslan, Yusuf [2 ]
Hareedy, Ahmed [2 ]
机构
[1] Middle East Tech Univ, Inst Appl Math, TR-06800 Ankara, Turkiye
[2] Middle East Tech Univ, Dept Elect & Elect Engn, TR-06800 Ankara, Turkiye
关键词
Codes; DNA; Encoding; Symbols; Memory; Table lookup; Signal processing algorithms; Constrained codes; low-complexity algorithms; reconfigurable coding; LOCO codes; homopolymer run; balancing; error-detection; DNA data storage; APPROACHING CONSTRAINED CODES; RUN-LENGTH; CAPACITY;
D O I
10.1109/TMBMC.2024.3400794
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
From the information-theoretic perspective, DNA strands serve as a storage medium for 4-ary data over the alphabet {A, T, G, C} . DNA data storage promises formidable information density, long-term durability, and ease of replicability. However, information in this intriguing storage technology might be corrupted because of error-prone data sequences as well as insertion, deletion, and substitution errors. Experiments have revealed that DNA sequences with long homopolymers and/or with low GC-content are notably more subject to errors upon storage. In order to address this biochemical challenge, constrained codes are proposed for usage in DNA data storage systems, and they are studied in the literature accordingly. This paper investigates the utilization of the recently-introduced method for designing lexicographically-ordered constrained (LOCO) codes in DNA data storage to improve performance. LOCO codes offer capacity-achievability, low complexity, and ease of reconfigurability. This paper introduces novel constrained codes, namely DNA LOCO (D-LOCO) codes, over the alphabet {A, T, G, C} with limited runs of identical symbols. Due to their ordered structure, these codes come with an encoding-decoding rule we derive, which provides simple and affordable encoding-decoding algorithms. In terms of storage overhead, the proposed encoding-decoding algorithms outperform those in the existing literature. Our algorithms are based on small-size adders, and therefore they are readily reconfigurable. D-LOCO codes are intrinsically balanced, which allows us to achieve balanced AT- and GC-content over the entire DNA strand with minimal rate penalty. Moreover, we propose four schemes to bridge consecutive codewords, three of which guarantee single substitution error detection per codeword. We examine the probability of undetecting errors over a presumed symmetric DNA storage channel subject to substitution errors only. We also show that D-LOCO codes are capacity-achieving and that they offer remarkably high rates even at moderate lengths.
引用
收藏
页码:317 / 333
页数:17
相关论文
共 39 条
[1]   AN APPLICATION OF SYMBOLIC DYNAMICS TO INFORMATION-THEORY [J].
ADLER, RL ;
COPPERSMITH, D ;
HASSNER, M .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1983, 29 (01) :5-22
[2]   THE ENUMERATION OF CERTAIN RUN LENGTH SEQUENCES [J].
BLAKE, IF .
INFORMATION AND CONTROL, 1982, 55 (1-3) :222-237
[3]   An enumerative coding technique for DC-free runlength-limited sequences [J].
Braun, V ;
Immink, KAS .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2000, 48 (12) :2024-2031
[4]   Power Spectra of Constrained Codes With Level-Based Signaling: Overcoming Finite-Length Challenges [J].
Centers, Jessica ;
Tan, Xinyu ;
Hareedy, Ahmed ;
Calderbank, Robert .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (08) :4971-4986
[5]  
COVER TM, 1973, IEEE T INFORM THEORY, V19, P73, DOI 10.1109/TIT.1973.1054929
[6]   Non-Binary Constrained Codes for Two-Dimensional Magnetic Recording [J].
Dabak, Beyza ;
Hareedy, Ahmed ;
Calderbank, Robert .
IEEE TRANSACTIONS ON MAGNETICS, 2020, 56 (11)
[7]   SEQUENCE-STATE METHODS FOR RUN-LENGTH-LIMITED CODING [J].
FRANASZEK, PA .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1970, 14 (04) :376-+
[8]   A NEW APPROACH TO CONSTRUCTING OPTIMAL BLOCK-CODES FOR RUNLENGTH-LIMITED CHANNELS [J].
GU, J ;
FUJA, TE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1994, 40 (03) :774-785
[9]   Efficient Constrained Codes That Enable Page Separation in Modern Flash Memories [J].
Hareedy, Ahmed ;
Zheng, Simeng ;
Siegel, Paul ;
Calderbank, Robert .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2023, 71 (12) :6834-6848
[10]   The Secret Arithmetic of Patterns: A General Method for Designing Constrained Codes Based on Lexicographic Indexing [J].
Hareedy, Ahmed ;
Dabak, Beyza ;
Calderbank, Robert .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (09) :5747-5778