Succincter Text Indexing with Wildcards

被引:0
|
作者
Thachuk, Chris [1 ]
机构
[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1W5, Canada
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study the problem of indexing text with wildcard positions, motivated by the challenge of aligning sequencing data to large genomes that contain millions of single nucleotide polymorphisms (SNPs)-positions known to differ between individuals. SNPs modeled as wildcards can lead to more informed and biologically relevant alignments. We improve the space complexity of previous approaches by giving a succinct index requiring (2 + o(1)) n log sigma + O(n) + O(d log n) + O(k log k) bits for a text of length n over an alphabet of size s containing d groups of k wildcards. The new index is particularly favourable for larger alphabets and comparable for smaller alphabets, such as DNA. A key to the space reduction is a result we give showing how any compressed suffix array can be supplemented with auxiliary data structures occupying O(n) + O(d log n d) bits to also support efficient dictionary matching queries. We present a new query algorithm for our wildcard index that greatly reduces the query working space to O(dm + m log n) bits, where m is the length of the query. We note that compared to previous results this reduces the working space by two orders of magnitude when aligning short read data to the Human genome.
引用
收藏
页码:27 / 40
页数:14
相关论文
共 50 条
  • [1] Succinct Text Indexing with Wildcards
    Tam, Alan
    Wu, Edward
    Lam, Tak-Wah
    Yiu, Siu-Ming
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5721 : 39 - 50
  • [2] Compressed Text Indexing with Wildcards
    Hon, Wing-Kai
    Ku, Tsung-Han
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 267 - +
  • [3] Compressed text indexing with wildcards
    Hon, Wing-Kai
    Ku, Tsung-Han
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    JOURNAL OF DISCRETE ALGORITHMS, 2013, 19 (19) : 23 - 29
  • [4] String Indexing for Patterns with Wildcards
    Bille, Philip
    Li Gortz, Inge
    Vildhoj, Hjalte Wedel
    Vind, Soren
    THEORY OF COMPUTING SYSTEMS, 2014, 55 (01) : 41 - 60
  • [5] String Indexing for Patterns with Wildcards
    Philip Bille
    Inge Li Gørtz
    Hjalte Wedel Vildhøj
    Søren Vind
    Theory of Computing Systems, 2014, 55 : 41 - 60
  • [6] Less Space: Indexing for Queries with Wildcards
    Lewenstein, Moshe
    Munro, J. Ian
    Raman, Venkatesh
    Thankachan, Sharma V.
    ALGORITHMS AND COMPUTATION, 2013, 8283 : 89 - 99
  • [7] Less space: Indexing for queries with wildcards
    Lewenstein, Moshe
    Munro, J. Ian
    Raman, Venkatesh
    Thankachan, Sharma V.
    THEORETICAL COMPUTER SCIENCE, 2014, 557 : 120 - 127
  • [8] Compressed indexes for text with wildcards
    Thachuk, Chris
    THEORETICAL COMPUTER SCIENCE, 2013, 483 : 22 - 35
  • [9] Succincter
    Patrascu, Mihai
    PROCEEDINGS OF THE 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2008, : 305 - 313
  • [10] Dynamic "Succincter"
    Li, Tianxiao
    Liang, Jingxun
    Yu, Huacheng
    Zhou, Renfei
    2023 IEEE 64TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, FOCS, 2023, : 1715 - 1733