Large deviations-based upper bounds on the expected relative length of longest common subsequences

被引:9
作者
Hauser, Raphael
Martinez, Servet
Matzinger, Heinrich
机构
[1] Univ Oxford, Comp Lab, Oxford OX1 3QD, England
[2] Univ Chile, CNRS, CMM, DIM, Santiago, Chile
[3] Univ Bielefeld, Fak Math, D-33501 Bielefeld, Germany
[4] Georgia Inst Technol, Atlanta, GA 30332 USA
基金
英国工程与自然科学研究理事会;
关键词
longest common subsequence problem; Chvatal-Sankoff constant; upper bound; large deviation theory; Monte Carlo simulation;
D O I
10.1239/aap/1158685004
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Consider the random variable L-n defined as the length of a longest common subsequence of two random strings of length n and whose random characters are independent and identically distributed over a finite alphabet. Chvatal and Sankoff showed that the limit gamma = lim(n-->infinity)E[L-n]/n is well defined. The exact value of this constant is not known, but various methods for the computation of upper and lower bounds have been discussed in the literature. Even so, high-precision bounds are hard to come by. In this paper we discuss how. large deviation theory can be used to derive a consistent sequence of upper bounds, (q(m))(m is an element of N), on gamma, and how Monte Carlo simulation can be used in theory to compute estimates, (q) over cap (m), of the q(m) such that, for given Xi > 0 and Lambda is an element of (0, 1), we have P[gamma < <(q)over cap> < gamma + Xi] >= Lambda. In other words, with high probability the result is an upper bound that approximates gamma to high precision. We establish O((1 - Lambda)(-1) Xi(-(4+epsilon))) as a theoretical upper bound on the complexity of computing <(q)over cap>(m) to the given level of accuracy and confidence. Finally, we discuss a practical heuristic based on our theoretical approach and discuss its empirical behavior.
引用
收藏
页码:827 / 852
页数:26
相关论文
共 31 条
[1]   Longest increasing subsequences: From patience sorting to the Baik-Deift-Johansson theorem [J].
Aldous, D ;
Diaconis, P .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1999, 36 (04) :413-432
[2]   THE RATE OF CONVERGENCE OF THE MEAN LENGTH OF THE LONGEST COMMON SUBSEQUENCE [J].
Alexander, Kenneth S. .
ANNALS OF APPLIED PROBABILITY, 1994, 4 (04) :1074-1082
[3]  
[Anonymous], 1975, J APPL PROBAB
[4]  
APOSTOLICO A, 1993, LECT NOTES COMPUT SC, V684
[5]   THE ERDOS-RENYI LAW IN DISTRIBUTION, FOR COIN TOSSING AND SEQUENCE MATCHING [J].
ARRATIA, R ;
GORDON, L ;
WATERMAN, MS .
ANNALS OF STATISTICS, 1990, 18 (02) :539-570
[6]   2 MOMENTS SUFFICE FOR POISSON APPROXIMATIONS - THE CHEN-STEIN METHOD [J].
ARRATIA, R ;
GOLDSTEIN, L ;
GORDON, L .
ANNALS OF PROBABILITY, 1989, 17 (01) :9-25
[7]   THE ERDOS-RENYI STRONG LAW FOR PATTERN-MATCHING WITH A GIVEN PROPORTION OF MISMATCHES [J].
ARRATIA, R ;
WATERMAN, MS .
ANNALS OF PROBABILITY, 1989, 17 (03) :1152-1169
[8]   `A PHASE TRANSITION FOR THE SCORE IN MATCHING RANDOM SEQUENCES ALLOWING DELETIONS [J].
Arratia, Richard ;
Waterman, Michael S. .
ANNALS OF APPLIED PROBABILITY, 1994, 4 (01) :200-225
[9]  
Azuma K., 1967, TOHOKU MATH J, V19, P357, DOI DOI 10.2748/TMJ/1178243286
[10]   Bounding the expected length of longest common subsequences and forests [J].
BaezaYates, RA ;
Gavaldà, R ;
Navarro, G ;
Scheihing, R .
THEORY OF COMPUTING SYSTEMS, 1999, 32 (04) :435-452