Reverse-Safe Data Structures for Text Indexing

被引:0
|
作者
Bernardini, Giulia [1 ,2 ]
Chen, Huiping [3 ]
Fici, Gabriele [4 ]
Loukides, Grigorios [3 ]
Pissis, Solon P. [2 ]
机构
[1] Univ Milano Bicocca, Milan, Italy
[2] CWI, Amsterdam, Netherlands
[3] Kings Coll London, London, England
[4] Univ Palermo, Palermo, Italy
来源
2020 PROCEEDINGS OF THE SYMPOSIUM ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX | 2020年
关键词
COMMON SUBSTRING APPROACH; ABELIAN EQUIVALENCE; ANONYMIZATION; CONSTRUCTION; SPACE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O (n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n! log d) time, where ! is the matrix multiplication exponent. We show that, despite the n! factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.
引用
收藏
页码:199 / 213
页数:15
相关论文
共 50 条
  • [31] INDEXING AND SEARCHING IN STATUTORY TEXT
    CORBETT, M
    LAW LIBRARY JOURNAL, 1992, 84 (04): : 759 - 767
  • [32] Forty Years of Text Indexing
    Apostolico, Alberto
    Crochemore, Maxime
    Farach-Colton, Martin
    Galil, Zvi
    Muthukrishnan, S.
    COMBINATORIAL PATTERN MATCHING, 2013, 7922 : 1 - 10
  • [33] Document indexing in text categorization
    Zhang, QR
    Zhang, L
    Dong, SB
    Tan, JH
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3792 - 3796
  • [34] Automatic Subject Indexing of Text
    Golub, Koraljka
    KNOWLEDGE ORGANIZATION, 2019, 46 (02): : 104 - 121
  • [35] Compressed Text Indexing with Wildcards
    Hon, Wing-Kai
    Ku, Tsung-Han
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 267 - +
  • [36] Improved dynamic text indexing
    Ferragina, P
    Grossi, R
    JOURNAL OF ALGORITHMS, 1999, 31 (02) : 291 - 319
  • [37] FROM TEXT TO HYPERTEXT BY INDEXING
    SALMINEN, A
    TAGUESUTCLIFFE, J
    MCCLELLAN, C
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (01) : 69 - 99
  • [38] Succinct Text Indexing with Wildcards
    Tam, Alan
    Wu, Edward
    Lam, Tak-Wah
    Yiu, Siu-Ming
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5721 : 39 - 50
  • [39] Universal compressed text indexing
    Navarro, Gonzalo
    Prezza, Nicola
    THEORETICAL COMPUTER SCIENCE, 2019, 762 : 41 - 50
  • [40] Online timestamped text indexing
    Amir, A
    Landau, GM
    Ukkonen, E
    INFORMATION PROCESSING LETTERS, 2002, 82 (05) : 253 - 259