Position heaps: A simple and dynamic text indexing data structure

被引:24
|
作者
Ehrenfeucht, Andrzej [1 ]
McConnell, Ross M. [2 ]
Osheim, Nissa [2 ]
Woo, Sung-Whan [2 ]
机构
[1] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO 80309 USA
[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
关键词
Position heap; String searching;
D O I
10.1016/j.jda.2010.12.001
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We address the problem of finding the locations of all instances of a string P in a text T, where preprocessing of T is allowed in order to facilitate the queries. Previous data structures for this problem include the suffix tree, the suffix array, and the compact DAWG. We modify a data structure called a sequence tree, which was proposed by Coffman and Eve (1970) [ 3] for hashing, and adapt it to the new problem. We can then produce a list of k occurrences of any string P in T in O(parallel to P parallel to + k) time. Because of properties shared by suffixes of a text that are not shared by arbitrary hash keys, we can build the structure in O(parallel to T parallel to) time, which is much faster than Coffman and Eve's algorithm. These bounds are as good as those for the suffix tree, suffix array, and the compact DAWG. The advantages are the elementary nature of some of the algorithms for constructing and using the data structure and the asymptotic bounds we can give for updating the data structure when the text is edited. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:100 / 121
页数:22
相关论文
共 50 条
  • [21] An Opportunistic Text Indexing Structure Based on Run Length Encoding
    Tamakoshi, Yuya
    Goto, Keisuke
    Inenaga, Shunsuke
    Bannai, Hideo
    Takeda, Masayuki
    ALGORITHMS AND COMPLEXITY (CIAC 2015), 2015, 9079 : 390 - 402
  • [22] Predictive Indexing for Position Data of Moving Objects in the Real World
    Yanagisawa, Yutaka
    TRANSACTIONS ON COMPUTATIONAL SCIENCE VI, 2009, 5730 : 77 - 94
  • [23] Predictive indexing for position data of moving objects in the real world
    Yanagisawa, Yutaka
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2008, PT 1, PROCEEDINGS, 2008, 5072 : 615 - 630
  • [24] Position Index Preserving Compression for Text Data
    Akhtar, Md. Nasim
    Rashid, Md. Mamunur
    Islam, Md. Shafiqul
    Kashem, Mohammod Abul
    Kolybanov, Cyrll Y.
    JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2011, 11 (01): : 9 - 14
  • [25] ABSTRACTS - APPLICATION OF DATA PROCESSING EQUIPMENT TO CLASSIFICATION INDEXING AND TEXT PROCESSING
    不详
    AMERICAN DOCUMENTATION, 1965, 16 (01): : 49 - &
  • [26] Text Indexing, Suffix Sorting, and Data Compression: Common Problems and Techniques
    Grossi, Roberto
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2009, 5577 : 39 - 40
  • [27] Dynamic Data Retrieval Using Incremental Clustering and Indexing
    Priya, Uma D.
    Thilagam, Santhi P.
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2020, 10 (03) : 74 - 91
  • [28] ALGORITHMIC METHOD OF SELECTIVE INDEXING OF SIMPLE-STRUCTURE DOCUMENTS
    SOKOLOV, AV
    KOKORINA, AP
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1974, (05): : 11 - 16
  • [29] Arabic text data mining: A root-based hierarchical indexing model
    Eldos, T.M.
    International Journal of Modelling and Simulation, 2003, 23 (03): : 158 - 166
  • [30] Adaptive indexing structure on XML data stored in RDBMS
    College of Computer Science, Zhejiang University, Hangzhou 310027, China
    J. Comput. Inf. Syst., 2008, 1 (351-360):