APPROXIMATE BOYER-MOORE STRING MATCHING

被引:48
|
作者
TARHIO, J
UKKONEN, E
机构
[1] Univ of Helsinki, Helsinki
关键词
STRING MATCHING; EDIT DISTANCE; BOYER-MOORE ALGORITHM; K-MISMATCHES PROBLEM; K-DIFFERENCES PROBLEM;
D O I
10.1137/0222018
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. The generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the problem in expected time O(kn(1/(m -k) + (k/c))), where c is the size of the alphabet. A related algorithm is developed for the k differences problem, where the task is to find all approximate occurrences of a pattern in a text with less-than-or-equal-to k differences (insertions, deletions, changes). Experimental evaluation of the algorithms is reported, showing that the new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer-Moore algorithm when k = 0.
引用
收藏
页码:243 / 260
页数:18
相关论文
共 50 条