Compressed Text Indexing with Wildcards

被引:0
|
作者
Hon, Wing-Kai [1 ]
Ku, Tsung-Han [1 ]
Shah, Rahul [2 ]
Thankachan, Sharma V. [2 ]
Vitter, Jeffrey Scott [3 ]
机构
[1] Natl Tsing Hua Univ, Hsinchu, Taiwan
[2] Louisiana State Univ, Baton Rouge, LA 70803 USA
[3] Univ Kansas, Lawrence, KS 66045 USA
关键词
SUFFIX ARRAYS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Let T = T-1 phi T-k1(2)phi(k2) ... phi T-kd(d+1) be a text of total length n, where characters of each T-i are chosen from an alphabet Sigma of size sigma, and phi denotes a wildcard symbol. The text indexing with wildcards problem is to index. T such that when we are given a query pattern P we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as wildcards. Recently Tam et al. (2009) and Thachuk (2011) have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only nH(h) + o(n log sigma) + O(d log n) bits space, where H-h is the hth-order empirical entropy (h = o(log(sigma) n)) of T.
引用
收藏
页码:267 / +
页数:3
相关论文
共 50 条
  • [31] Text Mining for Indexing
    Gelernter, Judith
    Lesk, Michael
    JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2009, : 467 - 467
  • [32] COMPARISON OF PHILOSOPHY OF INDEXING TEXT WITH THAT OF INDEXING STRUCTURAL FORMULAE
    KENT, AK
    ASLIB PROCEEDINGS, 1967, 19 (11): : 364 - &
  • [33] Video indexing in the wavelet compressed domain
    Panchanathan, S
    Mandal, MK
    Aboulnasr, T
    1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 3, 1998, : 546 - 550
  • [34] Compressed indexing and local alignment of DNA
    Lam, T. W.
    Sung, W. K.
    Tam, S. L.
    Wong, C. K.
    Yiu, S. M.
    BIOINFORMATICS, 2008, 24 (06) : 791 - 797
  • [35] Indexing and retrieval of the MPEG compressed video
    Kobla, V
    Doermann, D
    JOURNAL OF ELECTRONIC IMAGING, 1998, 7 (02) : 294 - 307
  • [36] Image and video indexing in the compressed domain
    Mandal, MK
    Idris, F
    Panchanathan, S
    MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS II, 1997, 3229 : 2 - 13
  • [37] Indexing and retrieval of the MPEG compressed video
    University of Maryland, Lab. for Lang. and Media Processing, Center for Automation Research, College Park, MD 20742-3275, United States
    J Electron Imaging, 2 (294-306):
  • [38] Image/video indexing in the compressed domain
    Idris, F
    Panchanathan, S
    1996 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING - CONFERENCE PROCEEDINGS, VOLS I AND II: THEME - GLIMPSE INTO THE 21ST CENTURY, 1996, : 903 - 906
  • [39] INDEXING AND SEARCHING IN STATUTORY TEXT
    CORBETT, M
    LAW LIBRARY JOURNAL, 1992, 84 (04): : 759 - 767
  • [40] The gold text indexing engine
    Barbara, D
    Mehrotra, S
    Vallabhaneni, P
    PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 172 - 179