String matching with inversions and translocations in linear average time (most of the time)

被引:16
作者
Grabowski, Szymon [1 ]
Faro, Simone [2 ]
Giaquinta, Emanuele [2 ]
机构
[1] Tech Univ Lodz, Dept Comp Engn, PL-90924 Lodz, Poland
[2] Univ Catania, Dipartimento Matemat & Informat, I-95125 Catania, Italy
关键词
Approximate string matching; Algorithms; Bioinformatics;
D O I
10.1016/j.ipl.2011.02.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present an efficient algorithm for finding all approximate occurrences of a given pattern p of length m in a text t of length n allowing for translocations of equal length adjacent factors and inversions of factors. The algorithm is based on an efficient filtering method and has an O(nm max(alpha, beta))-time complexity in the worst case and O(max(alpha, beta, sigma))-space complexity, where alpha and beta are respectively the maximum length of the factors involved in any translocation and inversion, and a is the alphabet size. Moreover we show that our algorithm has an O(n) average time complexity, whenever sigma = Omega(log m/log log(1-epsilon) m), for epsilon > 0. Experiments show that the proposed algorithm achieves very good results in practical cases. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:516 / 520
页数:5
相关论文
共 7 条