Efficient and scalable indexing techniques for biological sequence data

被引:0
作者
Halachev, Mihail [1 ]
Shiri, Nematollaah [1 ]
Thamildurai, Anand [1 ]
机构
[1] Concordia Univ, Dept Comp Sci & Software Engn, Montreal, PQ, Canada
来源
BIOINFORMATICS RESEARCH AND DEVELOPMENT, PROCEEDINGS | 2007年 / 4414卷
基金
加拿大自然科学与工程研究理事会;
关键词
biological databases; sequences; indexing; suffix trees;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We investigate indexing techniques for sequence data, crucial in a wide variety of applications, where efficient, scalable, and versatile search algorithms are required. Recent research has focused on suffix trees (ST) and suffix arrays (SA) as desirable index representations. Existing solutions for very long sequences however provide either efficient index construction or efficient search, but not both. We propose a new ST representation, STTD64, which has reasonable construction time and storage requirement, and is efficient in search. We have implemented the construction and search algorithms for the proposed technique and conducted numerous experiments to evaluate its performance on various types of real sequence data. Our results show that while the construction time for STTD64 is comparable with current ST based techniques, it outperforms them in search. Compared to ESA, the best known SA technique, STTD64 exhibits slower construction time, but has similar space requirement and comparable search time. Unlike ESA, which is memory based, STTD64 is scalable and can handle very long sequences.
引用
收藏
页码:464 / +
页数:3
相关论文
共 28 条
[1]  
Abouelhoda M. I., 2004, Journal of Discrete Algorithms, V2, P53, DOI 10.1016/S1570-8667(03)00065-0
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]   EFFICIENT IMPLEMENTATION OF SUFFIX TREES [J].
ANDERSSON, A ;
NILSSON, S .
SOFTWARE-PRACTICE & EXPERIENCE, 1995, 25 (02) :129-141
[5]  
[Anonymous], [No title captured], DOI DOI 10.1145/299432.299460
[6]   A theoretical and experimental study on the construction of suffix arrays in external memory [J].
Crauser, A ;
Ferragina, P .
ALGORITHMICA, 2002, 32 (01) :1-35
[7]   Alignment of whole genomes [J].
Delcher, AL ;
Kasif, S ;
Fleischmann, RD ;
Peterson, J ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (11) :2369-2376
[8]  
DEMENTIEV R, 2005, P 7 WORKSH ALG ENG E
[9]   The string B-tree: A new data structure for string search in external memory and its applications [J].
Ferragina, P ;
Grossi, R .
JOURNAL OF THE ACM, 1999, 46 (02) :236-280
[10]   Efficient implementation of lazy suffix trees [J].
Giegerich, R ;
Kurtz, S ;
Stoye, J .
SOFTWARE-PRACTICE & EXPERIENCE, 2003, 33 (11) :1035-1049