Why greed works for shortest common superstring problem

被引:12
作者
Ma, Bin [1 ]
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Shortest common superstring; Smoothed analysis; Polynomial time approximation scheme; APPROXIMATION ALGORITHM; SMOOTHED ANALYSIS;
D O I
10.1016/j.tcs.2009.09.014
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The shortest common superstring problem (SCS) has been extensively studied for its applications in string compression and DNA sequence assembly. Although the problem is known to be Max-SNP hard, the simple greedy algorithm performs extremely well in practice. To explain the good performance, previous researchers proved that the greedy algorithm is asymptotically optimal on random instances. Unfortunately, the practical instances in DNA sequence assembly are very different from the random instances. In this paper we explain the good performance of the greedy algorithm by using the smoothed analysis. We show that, for any given instance 1 of SCS, the average approximation ratio of the greedy algorithm on a small random perturbation oft is 1 + o(1). The perturbation defined in the paper is small and naturally represents the mutations of the DNA sequence during evolution. Due to the existence of the uncertain nucleotides in the output of a DNA sequencing machine, we also proposed the shortest common superstring with wildcards problem (SCSW). We prove that in the worst case SCSW cannot be approximated within the ratio n(1/7-epsilon), while the greedy algorithm still has 1 + o(1) smoothed approximation ratio. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:5374 / 5381
页数:8
相关论文
共 22 条
  • [1] Andoni A, 2008, LECT NOTES COMPUT SC, V5125, P357, DOI 10.1007/978-3-540-70575-8_30
  • [2] [Anonymous], 1995, Introduction to computational biology: maps, sequences and genomes
  • [3] ARMEN C, 1996, P 7 ANN S COMB PATT, P87
  • [4] Free bits, PCPs, and nonapproximability - Towards tight results
    Bellare, M
    Goldreich, O
    Sudan, M
    [J]. SIAM JOURNAL ON COMPUTING, 1998, 27 (03) : 804 - 915
  • [5] LINEAR-APPROXIMATION OF SHORTEST SUPERSTRINGS
    BLUM, A
    JIANG, T
    LI, M
    TROMP, J
    YANNAKAKIS, M
    [J]. JOURNAL OF THE ACM, 1994, 41 (04) : 630 - 647
  • [6] Rotations of periodic strings and short superstrings
    Breslauer, D
    Jiang, T
    Jiang, ZG
    [J]. JOURNAL OF ALGORITHMS, 1997, 24 (02) : 340 - 353
  • [7] Greedy algorithms for the shortest common superstring that are asymptotically optimal
    Frieze, A
    Szpankowski, W
    [J]. ALGORITHMICA, 1998, 21 (01) : 21 - 36
  • [8] ON FINDING MINIMAL LENGTH SUPERSTRINGS
    GALLANT, J
    MAIER, D
    STORER, JA
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1980, 20 (01) : 50 - 58
  • [9] The greedy algorithm for shortest superstrings
    Kaplan, H
    Shafrir, N
    [J]. INFORMATION PROCESSING LETTERS, 2005, 93 (01) : 13 - 17
  • [10] Li M., 1990, Proceedings. 31st Annual Symposium on Foundations of Computer Science (Cat. No.90CH2925-6), P125, DOI 10.1109/FSCS.1990.89531