Reverse-Safe Data Structures for Text Indexing

被引:0
|
作者
Bernardini, Giulia [1 ,2 ]
Chen, Huiping [3 ]
Fici, Gabriele [4 ]
Loukides, Grigorios [3 ]
Pissis, Solon P. [2 ]
机构
[1] Univ Milano Bicocca, Milan, Italy
[2] CWI, Amsterdam, Netherlands
[3] Kings Coll London, London, England
[4] Univ Palermo, Palermo, Italy
来源
2020 PROCEEDINGS OF THE SYMPOSIUM ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX | 2020年
关键词
COMMON SUBSTRING APPROACH; ABELIAN EQUIVALENCE; ANONYMIZATION; CONSTRUCTION; SPACE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O (n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n! log d) time, where ! is the matrix multiplication exponent. We show that, despite the n! factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.
引用
收藏
页码:199 / 213
页数:15
相关论文
共 50 条
  • [41] Thai Word Safe Segmentation with Bounding Extension for Data Indexing in Search Engine
    Klahan, Achariya
    Pannoi, Sukunya
    Uewichitrapochana, Prapeepat
    Wiangsripanawan, Rungrat
    RECENT ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY 2018, 2019, 769 : 83 - 92
  • [42] Use of reverse text-mining to establish whether indexing and classification of chemical patents is still necessary
    Stembridge, Robert A.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 248
  • [43] Arabic text data mining: A root-based hierarchical indexing model
    Eldos, T.M.
    International Journal of Modelling and Simulation, 2003, 23 (03): : 158 - 166
  • [44] Automatic text segmentation and text recognition for video indexing
    Lienhart, R
    Effelsberg, W
    MULTIMEDIA SYSTEMS, 2000, 8 (01) : 69 - 81
  • [45] Automatic text segmentation and text recognition for video indexing
    Rainer Lienhart
    Wolfgang Effelsberg
    Multimedia Systems, 2000, 8 : 69 - 81
  • [46] Texture, text, and context of the folklore text vs indexing
    Jason, H
    JOURNAL OF FOLKLORE RESEARCH, 1997, 34 (03) : 221 - 225
  • [47] COMBINING CHEMICAL STRUCTURES WITH DATA AND TEXT ON A MICROCOMPUTER
    TOWN, WG
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1986, 192 : 46 - CINF
  • [48] COMBINING CHEMICAL STRUCTURES WITH DATA AND TEXT ON A MICROCOMPUTER
    TOWN, WG
    ACS SYMPOSIUM SERIES, 1987, 341 : 9 - 17
  • [49] Linking indexing data structures to de Bruijn graphs: Construction and update
    Cazaux, Bastien
    Lecroq, Thierry
    Rivals, Eric
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2019, 104 : 165 - 183
  • [50] DYNAMIC INDEXING FOR THE ANALYSIS OF LITERARY TEXT
    WEISS, H
    INTERNATIONAL CLASSIFICATION, 1991, 18 (04): : 200 - 204