Using sequence compression to speedup probabilistic profile matching

被引:13
作者
Freschi, V [1 ]
Bogliolo, A [1 ]
机构
[1] Univ Urbino, Informat Sci & Technol Inst, I-61029 Urbino, Italy
关键词
D O I
10.1093/bioinformatics/bti323
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P < N. Results: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding.
引用
收藏
页码:2225 / 2229
页数:5
相关论文
共 15 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Let sleeping files lie: Pattern matching in Z-compressed files [J].
Amir, A ;
Benson, G ;
Farach, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1996, 52 (02) :299-307
[3]  
[Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
[4]  
Attwood T K, 2000, Brief Bioinform, V1, P45, DOI 10.1093/bib/1.1.45
[5]  
BECKSTETTE M, 2004, P GERM C BIOINF 2004
[6]   A subquadratic sequence alignment algorithm for unrestricted scoring matrices [J].
Crochemore, M ;
Landau, GM ;
Ziv-Ukelson, M .
SIAM JOURNAL ON COMPUTING, 2003, 32 (06) :1654-1673
[7]  
Dorohonceanu B, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P128
[8]  
Gonnet G. H., 2004, Journal of Discrete Algorithms, V2, P3, DOI 10.1016/S1570-8667(03)00062-5
[9]  
Kärkkäinen J, 2000, LECT NOTES COMPUT SC, V1848, P195
[10]   The efficient computation of position-specific match scores with the fast Fourier transform [J].
Rajasekaran, S ;
Jin, X ;
Spouge, JL .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (01) :23-33