Reducing space for index implementation

被引:14
作者
Crochemore, M [1 ]
机构
[1] Univ Marne La Vallee, Inst Gaspard Monge, F-77454 Marne La Vallee 2, France
关键词
data retrieval; suffix tree; suffix automaton; DAWG; suffix oracle; index; text compression; pattern matching;
D O I
10.1016/S0304-3975(01)00222-5
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This article considers several strategies to implement efficiently full indexes on raw textual data. Indexes are based on representations of all the suffixes of the original text, for which we describe three types of implementations aimed at reducing the memory space. The first method is a combination of compaction and minimization that leads to the compact suffix automaton. As a second method we show that considering a complement language can be useful especially when it is related to data compression. Finally, approximation of the set of suffixes is the third technique used to reduce the space of the implementation. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:185 / 197
页数:13
相关论文
共 31 条
  • [1] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH
    AHO, AV
    CORASICK, MJ
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 333 - 340
  • [2] Allauzen C, 1999, LECT NOTES COMPUT SC, V1725, P295
  • [3] EFFICIENT IMPLEMENTATION OF SUFFIX TREES
    ANDERSSON, A
    NILSSON, S
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 1995, 25 (02) : 129 - 141
  • [4] BAEZAYATES RA, 1999, MODERN INFORMATION R
  • [5] BALIK M, 2000, DCPSR200002 CZECH TU
  • [6] Beal M.-P., 1996, STACS 96. 13th Annual Symposium on Theoretical Aspects of Computer Science. Proceedings, P555
  • [7] Berstel J., 1979, TEUBNER STUDIENBUCHE, V38
  • [8] AVERAGE SIZES OF SUFFIX TREES AND DAWGS
    BLUMER, A
    EHRENFEUCHT, A
    HAUSSLER, D
    [J]. DISCRETE APPLIED MATHEMATICS, 1989, 24 (1-3) : 37 - 45
  • [9] THE SMALLEST AUTOMATION RECOGNIZING THE SUBWORDS OF A TEXT
    BLUMER, A
    BLUMER, J
    HAUSSLER, D
    EHRENFEUCHT, A
    CHEN, MT
    SEIFERAS, J
    [J]. THEORETICAL COMPUTER SCIENCE, 1985, 40 (01) : 31 - 55
  • [10] TRANSDUCERS AND REPETITIONS
    CROCHEMORE, M
    [J]. THEORETICAL COMPUTER SCIENCE, 1986, 45 (01) : 63 - 86