Text indexing and dictionary matching with one error

被引:47
|
作者
Amir, A [1 ]
Keselman, D
Landau, GM
Lewenstein, M
Lewenstein, N
Rodeh, M
机构
[1] Bar Ilan Univ, Dept Math & Comp Sci, IL-52900 Ramat Gan, Israel
[2] Georgia Tech, Atlanta, GA USA
[3] Simons Technol, Decatur, GA 30030 USA
[4] Polytech Univ, Dept Comp & Informat Sci, Metrotech Ctr 6, Brooklyn, NY 11201 USA
[5] Univ Haifa, Dept Comp Sci, IL-31905 Haifa, Israel
[6] MATAM, Ctr Adv Technol, IBM, Res Lab Haifa, IL-31905 Haifa, Israel
基金
以色列科学基金会; 美国国家科学基金会;
关键词
D O I
10.1006/jagm.2000.1104
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The indexing problem is where a text is preprocessed and subsequent queries of the form "Find all occurrences of pattern P in the text" are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preprocessed and subsequent queries of the form "Find all occurrences of dictionary patterns in text T" are answered in time proportional to the length of the text and the number of occurrences. There exist efficient worst-case solutions for the indexing problem and the dictionary matching problem, but none that find approximate occurrences of the patterns, i.e., where the pattern is within a bound edit (or Hamming) distance from the appropriate text location. In this paper we present a uniform deterministic solution to both the indexing and the general dictionary matching problem with one error. We preprocess the data in time O(n log(2) n), where n is the text size in the indexing problem and the dictionary size in the dictionary matching problem. Our query time for the indexing problem is O(m log n log log n + tocc), where m is the query string size and tocc is the number of occurrences, Our query time for the dictionary matching problem is O(n log(3) d log log d + tocc), where n is the text size and d the dictionary size. The time bounds above apply to both bounded and unbounded alphabets, (C) 2000 Academic Press.
引用
收藏
页码:309 / 325
页数:17
相关论文
共 50 条
  • [1] Indexing and dictionary matching with one error (Extended abstract)
    Amir, A
    Keselman, D
    Landau, GM
    Lewenstein, M
    Lewenstein, N
    Rodeh, M
    ALGORITHMS AND DATA STRUCTURES, 1999, 1663 : 181 - 192
  • [2] Compressed Dictionary Matching With One Error
    Hon, Wing-Kai
    Ku, Tsung-Han
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    2011 DATA COMPRESSION CONFERENCE (DCC), 2011, : 113 - 122
  • [3] A new method for approximate indexing and dictionary lookup with one error
    Maass, MG
    Nowak, J
    INFORMATION PROCESSING LETTERS, 2005, 96 (05) : 185 - 191
  • [4] Indexing a dictionary for subset matching queries
    Landau, Cad M.
    Tsur, Dekel
    Weimann, Oren
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2007, 4726 : 195 - +
  • [5] Indexing a Dictionary for Subset Matching Queries
    Landau, Gad M.
    Tsur, Dekel
    Weimann, Oren
    ALGORITHMS AND APPLICATIONS: ESSAYS DEDICATED TO ESKO UKKONEN ON THE OCCASION OF HIS 60TH BIRTHDAY, 2010, 6060 : 158 - +
  • [6] Text Indexing for Regular Expression Matching
    Gibney, Daniel
    Thankachan, Sharma, V
    ALGORITHMS, 2021, 14 (05)
  • [7] Dictionary Matching with One Gap
    Amir, Amihood
    Levy, Avivit
    Porat, Ely
    Shalom, B. Riva
    COMBINATORIAL PATTERN MATCHING, CPM 2014, 2014, 8486 : 11 - 20
  • [8] Parameterized Text Indexing with One Wildcard
    Ganguly, Arnab
    Hon, Wing-Kai
    Huang, Yu-An
    Pissis, Solon P.
    Shah, Rahul
    Thankachan, Sharma, V
    2019 DATA COMPRESSION CONFERENCE (DCC), 2019, : 152 - 161
  • [9] Dictionary Matching with a Bounded Gap in Pattern or in Text
    Hon, Wing-Kai
    Lam, Tak-Wah
    Shah, Rahul
    Thankachan, Sharma V.
    Ting, Hing-Fung
    Yang, Yilin
    ALGORITHMICA, 2018, 80 (02) : 698 - 713
  • [10] Dictionary Matching with a Bounded Gap in Pattern or in Text
    Wing-Kai Hon
    Tak-Wah Lam
    Rahul Shah
    Sharma V. Thankachan
    Hing-Fung Ting
    Yilin Yang
    Algorithmica, 2018, 80 : 698 - 713