Text indexing with errors

被引:0
|
作者
Maass, MG [1 ]
Nowak, J [1 ]
机构
[1] Tech Univ Munich, Fak Informat, D-85748 Garching, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the lookup time is not linear or depends upon the size of the document corpus. Our data structure has size O(n log(k) n) on average and with high probability for input size n and queries with up to k errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear lookup time on average.
引用
收藏
页码:21 / 32
页数:12
相关论文
共 50 条
  • [21] Automatic text segmentation and text recognition for video indexing
    Rainer Lienhart
    Wolfgang Effelsberg
    Multimedia Systems, 2000, 8 : 69 - 81
  • [22] Texture, text, and context of the folklore text vs indexing
    Jason, H
    JOURNAL OF FOLKLORE RESEARCH, 1997, 34 (03) : 221 - 225
  • [23] Errors in DOI indexing by bibliometric databases
    Franceschini, Fiorenzo
    Maisano, Domenico
    Mastrogiacomo, Luca
    SCIENTOMETRICS, 2015, 102 (03) : 2181 - 2186
  • [24] Errors in DOI indexing by bibliometric databases
    Fiorenzo Franceschini
    Domenico Maisano
    Luca Mastrogiacomo
    Scientometrics, 2015, 102 : 2181 - 2186
  • [25] Reverse-Safe Text Indexing
    Bernardini G.
    Chen H.
    Fici G.
    Loukides G.
    Pissis S.P.
    1600, Association for Computing Machinery (26):
  • [26] DYNAMIC INDEXING FOR THE ANALYSIS OF LITERARY TEXT
    WEISS, H
    INTERNATIONAL CLASSIFICATION, 1991, 18 (04): : 200 - 204
  • [27] COMPUTER EVALUATION OF INDEXING AND TEXT PROCESSING
    SALTON, G
    LESK, ME
    JOURNAL OF THE ACM, 1968, 15 (01) : 8 - &
  • [28] Text segmentation by latent semantic indexing
    Ishioka, T
    NEW DEVELOPMENTS IN PSYCHOMETRICS, 2003, : 689 - 696
  • [29] Sparse Text Indexing in Small Space
    Bille, Philip
    Fischer, Johannes
    Gortz, Inge Li
    Kopelowitz, Tsvi
    Sach, Benjamin
    Vildhoj, Hjalte Wedel
    ACM TRANSACTIONS ON ALGORITHMS, 2016, 12 (03)
  • [30] Parameterized Text Indexing with One Wildcard
    Ganguly, Arnab
    Hon, Wing-Kai
    Huang, Yu-An
    Pissis, Solon P.
    Shah, Rahul
    Thankachan, Sharma, V
    2019 DATA COMPRESSION CONFERENCE (DCC), 2019, : 152 - 161