Succincter Text Indexing with Wildcards

被引:0
|
作者
Thachuk, Chris [1 ]
机构
[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1W5, Canada
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study the problem of indexing text with wildcard positions, motivated by the challenge of aligning sequencing data to large genomes that contain millions of single nucleotide polymorphisms (SNPs)-positions known to differ between individuals. SNPs modeled as wildcards can lead to more informed and biologically relevant alignments. We improve the space complexity of previous approaches by giving a succinct index requiring (2 + o(1)) n log sigma + O(n) + O(d log n) + O(k log k) bits for a text of length n over an alphabet of size s containing d groups of k wildcards. The new index is particularly favourable for larger alphabets and comparable for smaller alphabets, such as DNA. A key to the space reduction is a result we give showing how any compressed suffix array can be supplemented with auxiliary data structures occupying O(n) + O(d log n d) bits to also support efficient dictionary matching queries. We present a new query algorithm for our wildcard index that greatly reduces the query working space to O(dm + m log n) bits, where m is the length of the query. We note that compared to previous results this reduces the working space by two orders of magnitude when aligning short read data to the Human genome.
引用
收藏
页码:27 / 40
页数:14
相关论文
共 50 条
  • [21] Automatic Subject Indexing of Text
    Golub, Koraljka
    KNOWLEDGE ORGANIZATION, 2019, 46 (02): : 104 - 121
  • [22] Improved dynamic text indexing
    Ferragina, P
    Grossi, R
    JOURNAL OF ALGORITHMS, 1999, 31 (02) : 291 - 319
  • [23] FROM TEXT TO HYPERTEXT BY INDEXING
    SALMINEN, A
    TAGUESUTCLIFFE, J
    MCCLELLAN, C
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (01) : 69 - 99
  • [24] Universal compressed text indexing
    Navarro, Gonzalo
    Prezza, Nicola
    THEORETICAL COMPUTER SCIENCE, 2019, 762 : 41 - 50
  • [25] Online timestamped text indexing
    Amir, A
    Landau, GM
    Ukkonen, E
    INFORMATION PROCESSING LETTERS, 2002, 82 (05) : 253 - 259
  • [26] Automatic text segmentation and text recognition for video indexing
    Lienhart, R
    Effelsberg, W
    MULTIMEDIA SYSTEMS, 2000, 8 (01) : 69 - 81
  • [27] Automatic text segmentation and text recognition for video indexing
    Rainer Lienhart
    Wolfgang Effelsberg
    Multimedia Systems, 2000, 8 : 69 - 81
  • [28] Texture, text, and context of the folklore text vs indexing
    Jason, H
    JOURNAL OF FOLKLORE RESEARCH, 1997, 34 (03) : 221 - 225
  • [29] Reverse-Safe Text Indexing
    Bernardini G.
    Chen H.
    Fici G.
    Loukides G.
    Pissis S.P.
    1600, Association for Computing Machinery (26):
  • [30] DYNAMIC INDEXING FOR THE ANALYSIS OF LITERARY TEXT
    WEISS, H
    INTERNATIONAL CLASSIFICATION, 1991, 18 (04): : 200 - 204