Linear Time Algorithms for Generalizations of the Longest Common Substring Problem

被引:19
作者
Arnold, Michael [2 ]
Ohlebusch, Enno [1 ]
机构
[1] Univ Ulm, Inst Theoret Comp Sci, D-89069 Ulm, Germany
[2] Capgemini SD&M AG, D-70597 Stuttgart, Germany
关键词
Suffix arrays; Longest common substring; Longest common repeat; String mining; Repeat analysis; CONSTRUCTION;
D O I
10.1007/s00453-009-9369-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In its simplest form, the longest common substring problem is to find a longest substring common to two or multiple strings. Using (generalized) suffix trees, this problem can be solved in linear time and space. A first generalization is the k -common substring problem: Given m strings of total length n, for all k with 2 <= k <= m simultaneously find a longest substring common to at least k of the strings. It is known that the k-common substring problem can also be solved in O(n) time (Hui in Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pp. 230-243, Springer, Berlin, 1992). A further generalization is the k -common repeated substring problem: Given m strings T ((1)),T ((2)),......T ((m)) of total length n and m positive integers x (1),.........,x (m) , for all k with 1 <= k <= m simultaneously find a longest string omega for which there are at least k strings T-(i1), T-(i2),......, T-(ik) (1 <= i(1) < i(2) < ....... < i(k) <= m) such that omega occurs at least x(ij) times in T-(ij) for each j with 1 <= j <= k. (For x (1)= ....=x (m) =1, we have the k-common substring problem.) In this paper, we present the first O(n) time algorithm for the k-common repeated substring problem. Our solution is based on a new linear time algorithm for the k-common substring problem.
引用
收藏
页码:806 / 818
页数:13
相关论文
共 13 条
[1]  
Apostolico A., 1985, Combinatorial Algorithms on Words, P85
[2]   RECURSIVE STAR-TREE PARALLEL DATA STRUCTURE [J].
BERKMAN, O ;
VISHKIN, U .
SIAM JOURNAL ON COMPUTING, 1993, 22 (02) :221-242
[3]  
Fischer J, 2007, LECT NOTES COMPUT SC, V4614, P459
[4]  
Gusfield D., 1999, ALGORITHMS STRINGS T
[5]  
HUI LCK, 1992, LECT NOTES COMPUT SC, V644, P230
[6]  
Kärkkäinen J, 2003, LECT NOTES COMPUT SC, V2719, P943
[7]  
Kasai T., 2001, Combinatorial Pattern Matching. 12th Annual Symposium, CPM 2001. Proceedings (Lecture Notes in Computer Science Vol. 2089), P181
[8]  
Kim DK, 2003, LECT NOTES COMPUT SC, V2676, P186
[9]  
Knuth D. E., 1977, SIAM Journal on Computing, V6, P323, DOI 10.1137/0206024
[10]  
Ko P, 2003, LECT NOTES COMPUT SC, V2676, P200