Text indexing and dictionary matching with one error

被引：47

作者：

Amir, A ^{[1
]}

Keselman, D

Landau, GM

Lewenstein, M

Lewenstein, N

Rodeh, M

机构：

[1] Bar Ilan Univ, Dept Math & Comp Sci, IL-52900 Ramat Gan, Israel

[2] Georgia Tech, Atlanta, GA USA

[3] Simons Technol, Decatur, GA 30030 USA

[4] Polytech Univ, Dept Comp & Informat Sci, Metrotech Ctr 6, Brooklyn, NY 11201 USA

[5] Univ Haifa, Dept Comp Sci, IL-31905 Haifa, Israel

[6] MATAM, Ctr Adv Technol, IBM, Res Lab Haifa, IL-31905 Haifa, Israel

来源：

JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC | 2000年 / 37卷 / 02期

基金：

以色列科学基金会; 美国国家科学基金会;

关键词：

D O I：

10.1006/jagm.2000.1104

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The indexing problem is where a text is preprocessed and subsequent queries of the form "Find all occurrences of pattern P in the text" are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preprocessed and subsequent queries of the form "Find all occurrences of dictionary patterns in text T" are answered in time proportional to the length of the text and the number of occurrences. There exist efficient worst-case solutions for the indexing problem and the dictionary matching problem, but none that find approximate occurrences of the patterns, i.e., where the pattern is within a bound edit (or Hamming) distance from the appropriate text location. In this paper we present a uniform deterministic solution to both the indexing and the general dictionary matching problem with one error. We preprocess the data in time O(n log(2) n), where n is the text size in the indexing problem and the dictionary size in the dictionary matching problem. Our query time for the indexing problem is O(m log n log log n + tocc), where m is the query string size and tocc is the number of occurrences, Our query time for the dictionary matching problem is O(n log(3) d log log d + tocc), where n is the text size and d the dictionary size. The time bounds above apply to both bounded and unbounded alphabets, (C) 2000 Academic Press.

引用

页码：309 / 325

页数：17

共 50 条

[41] Document indexing in text categorization
Zhang, QR
Zhang, L
Dong, SB
Tan, JH
PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3792 - 3796
[42] Automatic Subject Indexing of Text
Golub, Koraljka
KNOWLEDGE ORGANIZATION, 2019, 46 (02): : 104 - 121
[43] Improved dynamic text indexing
Ferragina, P
Grossi, R
JOURNAL OF ALGORITHMS, 1999, 31 (02) : 291 - 319
[44] Compressed Text Indexing with Wildcards
Hon, Wing-Kai
Ku, Tsung-Han
Shah, Rahul
Thankachan, Sharma V.
Vitter, Jeffrey Scott
STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 267 - +
[45] FROM TEXT TO HYPERTEXT BY INDEXING
SALMINEN, A
TAGUESUTCLIFFE, J
MCCLELLAN, C
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (01) : 69 - 99
[46] Succinct Text Indexing with Wildcards
Tam, Alan
Wu, Edward
Lam, Tak-Wah
Yiu, Siu-Ming
STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5721 : 39 - 50
[47] Universal compressed text indexing
Navarro, Gonzalo
Prezza, Nicola
THEORETICAL COMPUTER SCIENCE, 2019, 762 : 41 - 50
[48] Online timestamped text indexing
Amir, A
Landau, GM
Ukkonen, E
INFORMATION PROCESSING LETTERS, 2002, 82 (05) : 253 - 259
[49] Automated email answering by text-pattern matching: Performance and error analysis
Sneiders, Eriks
Sjoebergh, Jonas
Alfalahi, Alyaa
EXPERT SYSTEMS, 2018, 35 (01)
[50] Internal Dictionary Matching
Charalampopoulos, Panagiotis
Kociumaka, Tomasz
Mohamed, Manal
Radoszewski, Jakub
Rytter, Wojciech
Walen, Tomasz
ALGORITHMICA, 2021, 83 (07) : 2142 - 2169

← 1 2 3 4 5 →