An optimal algorithm for the maximum-density segment problem

被引:33
作者
Chung, KM
Lu, HI
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
[2] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 10764, Taiwan
关键词
bioinformatics; biological sequence analysis; maximum-density segment; slope selection; computational geometry; sequence algorithm; data structure;
D O I
10.1137/S0097539704440430
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We address a fundamental problem arising from analysis of biomolecular sequences. The input consists of two numbers w(min) and w(max) and a sequence S of n number pairs (a(i), w(i)) with w(i) > 0. Let segment S(i, j) of S be the consecutive subsequence of S between indices i and j. The density of S(i, j) is d(i, j) = (a(i) + a(i+1) + ... + a(j))/(w(i) + w(i+1) + ... + w(j)). The maximum-density segment problem is to find a maximum-density segment over all segments S(i, j) with w(min) = w(i) + w(i+1) + ... + w(j) = w(max). The best previously known algorithm for the problem, due to Goldwasser, Kao, and Lu [Proceedings of the Second International Workshop on Algorithms in Bioinformatics, R. Guigo and D. Gusfield, eds., Lecture Notes in Comput. Sci. 2452, Springer-Verlag, New York, 2002, pp. 157-171], runs in O(n log(w(max)-w(min) + 1)) time. In the present paper, we solve the problem in O(n) time. Our approach bypasses the complicated right-skew decomposition, introduced by Lin, Jiang, and Chao [J. Comput. System Sci., 65 (2002), pp. 570-586]. As a result, our algorithm has the capability to process the input sequence in an online manner, which is an important feature for dealing with genome-scale sequences. Moreover, for a type of input sequences S representable in O(m) space, we show how to exploit the sparsity of S and solve the maximum-density segment problem for S in O(m) time.
引用
收藏
页码:373 / 387
页数:15
相关论文
共 45 条
[1]  
Alexandrov N., 1998, P PAC S BIOC, P461
[2]   Isochores and the evolutionary genomics of vertebrates [J].
Bernardi, G .
GENE, 2000, 241 (01) :3-17
[3]   COMPOSITIONAL CONSTRAINTS AND GENOME EVOLUTION [J].
BERNARDI, G ;
BERNARDI, G .
JOURNAL OF MOLECULAR EVOLUTION, 1986, 24 (1-2) :1-11
[4]   Optimal slope selection via cuttings [J].
Bronnimann, H ;
Chazelle, B .
COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 1998, 10 (01) :23-29
[5]   PATTERNS IN THE GENOME [J].
CHARLESWORTH, B .
CURRENT BIOLOGY, 1994, 4 (02) :182-184
[6]   AN OPTIMAL-TIME ALGORITHM FOR SLOPE SELECTION [J].
COLE, R ;
SALOWE, JS ;
STEIGER, WL ;
SZEMEREDI, E .
SIAM JOURNAL ON COMPUTING, 1989, 18 (04) :792-810
[7]   STATISTICAL-ANALYSIS OF VERTEBRATE SEQUENCES REVEALS THAT LONG GENES ARE SCARCE IN GC-RICH ISOCHORES [J].
DURET, L ;
MOUCHIROUD, D ;
GAUTIER, C .
JOURNAL OF MOLECULAR EVOLUTION, 1995, 40 (03) :308-317
[8]   RECOMBINATION AND MAMMALIAN GENOME EVOLUTION [J].
EYREWALKER, A .
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 1993, 252 (1335) :237-243
[9]   EVIDENCE THAT BOTH G+C RICH AND G+C POOR ISOCHORES ARE REPLICATED EARLY AND LATE IN THE CELL-CYCLE [J].
EYREWALKER, A .
NUCLEIC ACIDS RESEARCH, 1992, 20 (07) :1497-1501