Reverse-Safe Data Structures for Text Indexing

被引:0
|
作者
Bernardini, Giulia [1 ,2 ]
Chen, Huiping [3 ]
Fici, Gabriele [4 ]
Loukides, Grigorios [3 ]
Pissis, Solon P. [2 ]
机构
[1] Univ Milano Bicocca, Milan, Italy
[2] CWI, Amsterdam, Netherlands
[3] Kings Coll London, London, England
[4] Univ Palermo, Palermo, Italy
来源
2020 PROCEEDINGS OF THE SYMPOSIUM ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX | 2020年
关键词
COMMON SUBSTRING APPROACH; ABELIAN EQUIVALENCE; ANONYMIZATION; CONSTRUCTION; SPACE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O (n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n! log d) time, where ! is the matrix multiplication exponent. We show that, despite the n! factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.
引用
收藏
页码:199 / 213
页数:15
相关论文
共 50 条
  • [21] INDEXING AND DATA PROCESSING OF MODULATED STRUCTURES ON CCD DETECTORS
    van Meurs, F.
    Straver, L. H.
    Duisenberg, A. J. M.
    Schreurs, A. M. M.
    Hooft, R. W. W.
    ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES, 2002, 58 : C181 - C181
  • [22] From Indexing Data Structures to de Bruijn Graphs
    Cazaux, Bastien
    Lecroq, Thierry
    Rivals, Eric
    COMBINATORIAL PATTERN MATCHING, CPM 2014, 2014, 8486 : 89 - 99
  • [23] Transaction safe nonblocking data structures
    Marathe, Virendra J.
    Spear, Michael F.
    Scott, Michael L.
    DISTRIBUTED COMPUTING, PROCEEDINGS, 2007, 4731 : 488 - +
  • [24] Contracted Suffix Trees: A Simple and Dynamic Text Indexing Data Structure
    Ehrenfeucht, Andrzej
    McConnell, Ross M.
    Woo, Sung-Whan
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2009, 5577 : 41 - +
  • [25] ABSTRACTS - APPLICATION OF DATA PROCESSING EQUIPMENT TO CLASSIFICATION INDEXING AND TEXT PROCESSING
    不详
    AMERICAN DOCUMENTATION, 1965, 16 (01): : 49 - &
  • [26] Text Indexing, Suffix Sorting, and Data Compression: Common Problems and Techniques
    Grossi, Roberto
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2009, 5577 : 39 - 40
  • [27] COMPARISON OF PHILOSOPHY OF INDEXING TEXT WITH THAT OF INDEXING STRUCTURAL FORMULAE
    KENT, AK
    ASLIB PROCEEDINGS, 1967, 19 (11): : 364 - &
  • [28] Succincter Text Indexing with Wildcards
    Thachuk, Chris
    COMBINATORIAL PATTERN MATCHING, 22ND ANNUAL SYMPOSIUM, CPM 2011, 2011, 6661 : 27 - 40
  • [29] The gold text indexing engine
    Barbara, D
    Mehrotra, S
    Vallabhaneni, P
    PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 172 - 179
  • [30] Compressed text indexing with wildcards
    Hon, Wing-Kai
    Ku, Tsung-Han
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    JOURNAL OF DISCRETE ALGORITHMS, 2013, 19 (19) : 23 - 29