Text indexing with errors

被引:0
|
作者
Maass, MG [1 ]
Nowak, J [1 ]
机构
[1] Tech Univ Munich, Fak Informat, D-85748 Garching, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the lookup time is not linear or depends upon the size of the document corpus. Our data structure has size O(n log(k) n) on average and with high probability for input size n and queries with up to k errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear lookup time on average.
引用
收藏
页码:21 / 32
页数:12
相关论文
共 50 条
  • [31] SEMIAUTOMATIC INDEXING OF STRUCTURED INFORMATION OF TEXT
    NISHIDA, F
    TAKAMATSU, S
    FUJITA, Y
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1984, 24 (01): : 15 - 20
  • [32] Concept indexing for automated text categorization
    Gómez, JM
    Cortizo, JC
    Puertas, E
    Ruiz, M
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2004, 3136 : 195 - 206
  • [33] Keywords, Indexing, Text Analysis: An Editorial
    Smiraglia, Richard P.
    KNOWLEDGE ORGANIZATION, 2013, 40 (03): : 155 - 159
  • [34] Overlapping statistical word indexing: A new indexing method for Japanese text
    Ogawa, Y
    Matsuda, T
    PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1997, : 226 - 234
  • [35] Text influenced molecular indexing (TIMI).
    Singh, SB
    Hull, RD
    Fluder, EM
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2001, 221 : U393 - U393
  • [36] Text Indexing for Regular Expression Matching
    Gibney, Daniel
    Thankachan, Sharma, V
    ALGORITHMS, 2021, 14 (05)
  • [37] Classification of Errors in Text
    Busta, Jan
    Hlavackova, Dana
    Jakubicek, Milos
    Pala, Karel
    RASLAN 2009: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2009, : 109 - 119
  • [38] Text corpus with errors
    Pala, K
    Rychly, P
    Smrz, P
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 90 - 97
  • [39] Spanish name indexing errors in international databases
    Ruiz-Pérez, R
    López-Cózar, ED
    Jimémez-Contreras, E
    LANCET, 2003, 361 (9369): : 1656 - 1657
  • [40] CHECKING INDEXING ERRORS OF ROTARY POSITIONING TABLES
    MINAIRE, C
    MACHINERY AND PRODUCTION ENGINEERING, 1971, 119 (3077): : 661 - &