A four-stage algorithm for updating a Burrows-Wheeler transform

被引:12
作者
Salson, M. [1 ]
Lecroq, T. [1 ]
Leonard, M. [1 ]
Mouchard, L. [1 ,2 ]
机构
[1] Univ Rouen, LITIS EA 4108, F-76821 Mont St Aignan, France
[2] Kings Coll London, Dept Comp Sci, Algorithm Design Grp, London WC2R 2LS, England
关键词
Burrows-Wheeler transform; Compression; Dynamic; Suffix array; Edit operations; Algorithm design; Self-index data structures;
D O I
10.1016/j.tcs.2009.07.016
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a four-stage algorithm that updates the Burrows-Wheeler Transform of a text T, when this text is modified. The Burrows-Wheeler Transform is used by many text compression applications and some self-index data structures. It operates by reordering the letters of a text T to obtain a new text bwt(T) which can be better compressed. Even though recent advances are offering this structure new applications, a major bottleneck still exists: bwt(T) has to be entirely reconstructed from scratch whenever T is modified. We study how standard edit operations (insertion, deletion, substitution of a letter or a factor) that transform a text T into T' impact bwt(T). Then we present an algorithm that directly converts bwt(T) into bwt(T'). Based on this algorithm, we also sketch a method for converting the suffix array of T into the suffix array of T'. We finally show, based on the experiments we conducted, that this algorithm, whose worst-case time complexity is O(vertical bar T vertical bar log vertical bar T vertical bar(1 + log sigma/log log vertical bar T vertical bar)), performs really well in practice and replaces advantageously the traditional approach. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:4350 / 4359
页数:10
相关论文
共 15 条
[1]  
Burrows M, 1994, 124 DEC
[2]   DATA-COMPRESSION USING ADAPTIVE CODING AND PARTIAL STRING MATCHING [J].
CLEARY, JG ;
WITTEN, IH .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1984, 32 (04) :396-402
[3]   Unbounded length contexts for PPM [J].
Cleary, JG ;
Teahan, WJ .
COMPUTER JOURNAL, 1997, 40 (2-3) :67-75
[4]   A note on the Burrows-Wheeler transformation [J].
Crochemore, M ;
Désarménien, J ;
Perrin, D .
THEORETICAL COMPUTER SCIENCE, 2005, 332 (1-3) :567-572
[5]   Opportunistic data structures with applications [J].
Ferragina, P ;
Manzini, G .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :390-398
[6]   Compressed Representations of Sequences and Full-Text Indexes [J].
Ferragina, Paolo ;
Manzini, Giovanni ;
Makinen, Veli ;
Navarro, Gonzalo .
ACM TRANSACTIONS ON ALGORITHMS, 2007, 3 (02)
[7]   The Burrows-Wheeler transform - Dedicated to the memory of David Wheeler (1927-2004) - Foreword [J].
Ferragina, Paolo ;
Manzini, Giovanni ;
Muthukrishnan, S. .
THEORETICAL COMPUTER SCIENCE, 2007, 387 (03) :197-198
[8]  
GERLACH W, 2007, THESIS U BIELEFELD G
[9]  
GONNET CH, 1992, INFORM RETRIEVAL DAT, P66
[10]   Improved dynamic rank-select entropy-bound structures [J].
Gonzalez, Rodrigo ;
Navarro, Gonzalo .
LATIN 2008: THEORETICAL INFORMATICS, 2008, 4957 :374-386