Position heaps: A simple and dynamic text indexing data structure

被引:24
|
作者
Ehrenfeucht, Andrzej [1 ]
McConnell, Ross M. [2 ]
Osheim, Nissa [2 ]
Woo, Sung-Whan [2 ]
机构
[1] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO 80309 USA
[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
关键词
Position heap; String searching;
D O I
10.1016/j.jda.2010.12.001
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We address the problem of finding the locations of all instances of a string P in a text T, where preprocessing of T is allowed in order to facilitate the queries. Previous data structures for this problem include the suffix tree, the suffix array, and the compact DAWG. We modify a data structure called a sequence tree, which was proposed by Coffman and Eve (1970) [ 3] for hashing, and adapt it to the new problem. We can then produce a list of k occurrences of any string P in T in O(parallel to P parallel to + k) time. Because of properties shared by suffixes of a text that are not shared by arbitrary hash keys, we can build the structure in O(parallel to T parallel to) time, which is much faster than Coffman and Eve's algorithm. These bounds are as good as those for the suffix tree, suffix array, and the compact DAWG. The advantages are the elementary nature of some of the algorithms for constructing and using the data structure and the asymptotic bounds we can give for updating the data structure when the text is edited. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:100 / 121
页数:22
相关论文
共 50 条
  • [31] Dynamic Dimension Indexing for Efficient Skyline Maintenance on Data Streams
    Liu, Rui
    Li, Dominique
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT III, 2020, 12114 : 272 - 287
  • [32] Dynamic Indexing for Incremental Entity Resolution in Data Integration Systems
    Vieira, Priscilla Kelly M.
    Loscio, Bernadette Farias
    Salgado, Ana Carolina
    ICEIS: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2017, : 185 - 192
  • [33] An efficient indexing method for wireless data broadcast with dynamic updates
    Yan, HA
    Lee, YH
    2002 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS AND WEST SINO EXPOSITION PROCEEDINGS, VOLS 1-4, 2002, : 358 - 362
  • [34] A Versatile and Efficient GPU Data Structure for Spatial Indexing
    Schneider, Jens
    Rautek, Peter
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2017, 23 (01) : 911 - 920
  • [35] Efficient Data and Indexing Structure for Blockchains in Enterprise Systems
    Riegger, Christian
    Vincon, Tobias
    Petrov, Ilia
    IIWAS2018: THE 20TH INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2014, : 173 - 182
  • [36] A dynamic indexing structure for searching time-series patterns
    Kim, YI
    Park, Y
    Chun, J
    TWENTIETH ANNUAL INTERNATIONAL COMPUTER SOFTWARE & APPLICATIONS CONFERENCE (COMPSAC'96), PROCEEDINGS, 1996, 20 : 270 - 275
  • [37] Chronofold: a data structure for versioned text
    Grishchenko, Victor
    Patrakeev, Mikhail
    7TH WORKSHOP ON PRINCIPLES AND PRACTICE OF CONSISTENCY FOR DISTRIBUTED DATA (PAPOC '20), 2020,
  • [38] Text Generation From Data With Dynamic Planning
    Yang, Sen
    Liu, Yang
    Feng, Dawei
    Li, Dongsheng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 26 - 34
  • [39] Indexing and querying algorithm based on structure indexing for managing massive-scale RDF data
    Bae, Minho
    Kihm, Jangsu
    Kang, Sanggil
    Oh, Sangyoon
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 27 (02) : 575 - 587
  • [40] Mining multiple informational text structure from text data
    Das, Syaamantak
    Das Mandal, Shyamal Kumar
    Basu, Anupam
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 2211 - 2220