Bit Catastrophes for the Burrows-Wheeler Transform

被引:5
|
作者
Giuliani, Sara [1 ]
Inenaga, Shunsuke [2 ]
Liptak, Zsuzsanna [1 ]
Romana, Giuseppe [3 ]
Sciortino, Marinella [3 ]
Urbina, Cristian [4 ]
机构
[1] Univ Verona, Verona, Italy
[2] Kyushu Univ, Fukuoka, Japan
[3] Univ Palermo, Palermo, Italy
[4] Univ Chile, Santiago, Chile
来源
DEVELOPMENTS IN LANGUAGE THEORY, DLT 2023 | 2023年 / 13911卷
关键词
Burrows-Wheeler transform; Equal-letter run; Repetitiveness measure; Sensitivity;
D O I
10.1007/978-3-031-33264-7_8
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Theta(log n), where n is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive Theta(log n)-factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O( log n log r), since here r = O(1). We then give examples of strings in which insertion, deletion, resp. substitution of a character increases r by a Theta(root n) additive factor. These strings significantly improve the best known lower bound for an additive factor of Omega(log n) [Giuliani et al., SOFSEM 2021].
引用
收藏
页码:86 / 99
页数:14
相关论文
共 50 条
  • [31] Genomic Phylogeny Using the MaxwellTM Classifier Based on Burrows-Wheeler Transform
    Demongeot, Jacques
    Gardes, Joel
    Maldivi, Christophe
    Boisset, Denis
    Boufama, Kenza
    Touzouti, Imene
    COMPUTATION, 2023, 11 (08)
  • [32] Multithread Multistring Burrows-Wheeler Transform and Longest Common Prefix Array
    Bonizzoni, Paola
    Della Vedova, Gianluca
    Pirola, Yuri
    Previtali, Marco
    Rizzi, Raffaella
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (09) : 948 - 961
  • [33] Parallel architecture for DNA sequence inexact matching with Burrows-Wheeler Transform
    Xin, Yao
    Liu, Benben
    Min, Biao
    Li, Will X. Y.
    Cheung, Ray C. C.
    Fong, Anthony S.
    Chan, Ting Fung
    MICROELECTRONICS JOURNAL, 2013, 44 (08) : 670 - 682
  • [34] Computing the longest common prefix array based on the Burrows-Wheeler transform
    Beller, Timo
    Gog, Simon
    Ohlebusch, Enno
    Schnattinger, Thomas
    JOURNAL OF DISCRETE ALGORITHMS, 2013, 18 : 22 - 31
  • [35] A bijective variant of the Burrows-Wheeler Transform using V-order
    Daykin, Jacqueline W.
    Smyth, W. F.
    THEORETICAL COMPUTER SCIENCE, 2014, 531 : 77 - 89
  • [36] Burrows-Wheeler compression: Principles and reflections
    Fenwick, Peter
    THEORETICAL COMPUTER SCIENCE, 2007, 387 (03) : 200 - 219
  • [37] Improvements to Burrows-Wheeler compression algorithm
    Deorowicz, S
    SOFTWARE-PRACTICE & EXPERIENCE, 2000, 30 (13) : 1465 - 1483
  • [38] Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures
    Zhang, Jing
    Lin, Heshan
    Balaji, Pavan
    Feng, Wu-chun
    PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 377 - 384
  • [39] O(N) Memory-Free Hardware Architecture for Burrows-Wheeler Transform
    Ghosh, Surajeet
    Ray, Sanchita Saha
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (07) : 2080 - 2093
  • [40] Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree
    Kulekci, M. Oguzhan
    Vitter, Jeffrey Scott
    Xu, Bojian
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (02) : 421 - 429