The subsequence composition of a string

被引:1
作者
Apostolico, Alberto [1 ,2 ]
Cunial, Fabio [2 ]
机构
[1] Univ Padua, Dipartimento Ingn Informaz, Padua, Italy
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30318 USA
关键词
Constrained subsequences; Special subsequences; Suffix graph; Core equivalence classes; String complexity measures; SEQUENCES;
D O I
10.1016/j.tcs.2009.07.017
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Words that appear as constrained subsequences in a text-string are considered as possible indicators of the host string structure. hence also as a possible means of sequence comparison and classification. The constraint consists of imposing a bound on the number omega of positions in the text that may intervene between any two consecutive characters of a subsequence. A subset of such to-sequences is then characterized that consists, in intuitive terms, of sequences that could not be enriched with more characters without losing some occurrence in the text. A compact spatial representation is then proposed for these representative sequences, within which a number of parameters can be defined and measured. In the final part of the paper, such parameters are empirically analyzed on a small collection of text-strings endowed with various degrees of structure. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:4360 / 4371
页数:12
相关论文
共 16 条
[1]   EFFICIENT DETECTION OF QUASIPERIODICITIES IN STRINGS [J].
APOSTOLICO, A ;
EHRENFEUCHT, A .
THEORETICAL COMPUTER SCIENCE, 1993, 119 (02) :247-265
[2]  
APOSTOLICO A., 2005, P ISMB 05 INT SYST M, P9
[3]   COMPLETE INVERTED FILES FOR EFFICIENT TEXT RETRIEVAL AND ANALYSIS [J].
BLUMER, A ;
BLUMER, J ;
HAUSSLER, D ;
MCCONNELL, R ;
EHRENFEUCHT, A .
JOURNAL OF THE ACM, 1987, 34 (03) :578-595
[4]   Three great challenges for half-century-old computer science [J].
Brooks, FP .
JOURNAL OF THE ACM, 2003, 50 (01) :25-26
[5]  
CHATTARAJ A, 2005, THEORET COMPUT SCI, V335
[6]   Special factors in biological strings [J].
Colosimo, A ;
de Luca, A .
JOURNAL OF THEORETICAL BIOLOGY, 2000, 204 (01) :29-46
[7]  
Gratzer G., 1998, GEN LATTICE THEORY
[8]   3 APPROACHES TO QUANTITATIVE DEFINITION OF INFORMATION [J].
KOLMOGOROV, AN .
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1968, 2 (02) :157-+
[9]   COMPLEXITY OF FINITE SEQUENCES [J].
LEMPEL, A ;
ZIV, J .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1976, 22 (01) :75-81
[10]   The similarity metric [J].
Li, M ;
Chen, X ;
Li, X ;
Ma, B ;
Vitányi, PMB .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (12) :3250-3264