Text indexing with errors

被引:0
|
作者
Maass, MG [1 ]
Nowak, J [1 ]
机构
[1] Tech Univ Munich, Fak Informat, D-85748 Garching, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the lookup time is not linear or depends upon the size of the document corpus. Our data structure has size O(n log(k) n) on average and with high probability for input size n and queries with up to k errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear lookup time on average.
引用
收藏
页码:21 / 32
页数:12
相关论文
共 50 条
  • [1] Text indexing with errors
    Maass, Moritz G.
    Nowak, Johannes
    JOURNAL OF DISCRETE ALGORITHMS, 2007, 5 (04) : 662 - 681
  • [2] SEMANTIC TEXT INDEXING
    Kaleta, Zbigniew
    COMPUTER SCIENCE-AGH, 2014, 15 (01): : 19 - 34
  • [3] Text Mining for Indexing
    Gelernter, Judith
    Lesk, Michael
    JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2009, : 467 - 467
  • [4] Indexing compressed text
    Ferragina, P
    Manzini, G
    JOURNAL OF THE ACM, 2005, 52 (04) : 552 - 581
  • [5] COMPARISON OF PHILOSOPHY OF INDEXING TEXT WITH THAT OF INDEXING STRUCTURAL FORMULAE
    KENT, AK
    ASLIB PROCEEDINGS, 1967, 19 (11): : 364 - &
  • [6] Compressed text indexing with wildcards
    Hon, Wing-Kai
    Ku, Tsung-Han
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    JOURNAL OF DISCRETE ALGORITHMS, 2013, 19 (19) : 23 - 29
  • [7] Succincter Text Indexing with Wildcards
    Thachuk, Chris
    COMBINATORIAL PATTERN MATCHING, 22ND ANNUAL SYMPOSIUM, CPM 2011, 2011, 6661 : 27 - 40
  • [8] INDEXING AND SEARCHING IN STATUTORY TEXT
    CORBETT, M
    LAW LIBRARY JOURNAL, 1992, 84 (04): : 759 - 767
  • [9] The gold text indexing engine
    Barbara, D
    Mehrotra, S
    Vallabhaneni, P
    PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 172 - 179
  • [10] Forty Years of Text Indexing
    Apostolico, Alberto
    Crochemore, Maxime
    Farach-Colton, Martin
    Galil, Zvi
    Muthukrishnan, S.
    COMBINATORIAL PATTERN MATCHING, 2013, 7922 : 1 - 10