Compressed Text Indexing with Wildcards

被引:0
|
作者
Hon, Wing-Kai [1 ]
Ku, Tsung-Han [1 ]
Shah, Rahul [2 ]
Thankachan, Sharma V. [2 ]
Vitter, Jeffrey Scott [3 ]
机构
[1] Natl Tsing Hua Univ, Hsinchu, Taiwan
[2] Louisiana State Univ, Baton Rouge, LA 70803 USA
[3] Univ Kansas, Lawrence, KS 66045 USA
关键词
SUFFIX ARRAYS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Let T = T-1 phi T-k1(2)phi(k2) ... phi T-kd(d+1) be a text of total length n, where characters of each T-i are chosen from an alphabet Sigma of size sigma, and phi denotes a wildcard symbol. The text indexing with wildcards problem is to index. T such that when we are given a query pattern P we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as wildcards. Recently Tam et al. (2009) and Thachuk (2011) have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only nH(h) + o(n log sigma) + O(d log n) bits space, where H-h is the hth-order empirical entropy (h = o(log(sigma) n)) of T.
引用
收藏
页码:267 / +
页数:3
相关论文
共 50 条
  • [1] Compressed text indexing with wildcards
    Hon, Wing-Kai
    Ku, Tsung-Han
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    JOURNAL OF DISCRETE ALGORITHMS, 2013, 19 (19) : 23 - 29
  • [2] Succinct Text Indexing with Wildcards
    Tam, Alan
    Wu, Edward
    Lam, Tak-Wah
    Yiu, Siu-Ming
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5721 : 39 - 50
  • [3] Succincter Text Indexing with Wildcards
    Thachuk, Chris
    COMBINATORIAL PATTERN MATCHING, 22ND ANNUAL SYMPOSIUM, CPM 2011, 2011, 6661 : 27 - 40
  • [4] Compressed indexes for text with wildcards
    Thachuk, Chris
    THEORETICAL COMPUTER SCIENCE, 2013, 483 : 22 - 35
  • [5] Indexing compressed text
    Ferragina, P
    Manzini, G
    JOURNAL OF THE ACM, 2005, 52 (04) : 552 - 581
  • [6] Universal compressed text indexing
    Navarro, Gonzalo
    Prezza, Nicola
    THEORETICAL COMPUTER SCIENCE, 2019, 762 : 41 - 50
  • [7] String Indexing for Patterns with Wildcards
    Bille, Philip
    Li Gortz, Inge
    Vildhoj, Hjalte Wedel
    Vind, Soren
    THEORY OF COMPUTING SYSTEMS, 2014, 55 (01) : 41 - 60
  • [8] Alphabet-Independent Compressed Text Indexing
    Belazzougui, Djamal
    Navarro, Gonzalo
    ALGORITHMS - ESA 2011, 2011, 6942 : 748 - 759
  • [9] Alphabet-Independent Compressed Text Indexing
    Belazzougui, Djamal
    Navarro, Gonzalo
    ACM TRANSACTIONS ON ALGORITHMS, 2014, 10 (04)
  • [10] String Indexing for Patterns with Wildcards
    Philip Bille
    Inge Li Gørtz
    Hjalte Wedel Vildhøj
    Søren Vind
    Theory of Computing Systems, 2014, 55 : 41 - 60