Computing the Expected Edit Distance from a String to a Probabilistic Finite-State Automaton

被引:2
作者
Calvo-Zaragoza, Jorgc [1 ]
Oncina, Jose [1 ]
de la Higucra, Colin [2 ]
机构
[1] Univ Alicante, Dept Software & Comp Syst, Carretera San Vicente S-N, Alicante, Spain
[2] Univ Nantes, Lab Informat Nantes Atlantique, UMR 6241, Nantes, France
关键词
Edit distance; probabilistic finite state automata; median string;
D O I
10.1142/S0129054117400093
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In a number of fields, it is necessary to compare a witness string with a distribution. One possibility is to compute the probability of the string for that distribution. Another, giving a more global view, is to compute the expected edit distance from a string randomly drawn to the witness string. This number is often used to measure the performance of a prediction, the goal then being to return the median string, or the string with smallest expected distance. To be able to measure this, computing the distance between a hypothesis and that distribution is necessary. This paper proposes two solutions for computing this value, when the distribution is defined with a probabilistic finite state automaton. The first is exact but has a cost which can be exponential in the length of the input string, whereas the second is a fully polynomial-time randomized schema.
引用
收藏
页码:603 / 621
页数:19
相关论文
共 28 条
[1]   A new iterative algorithm for computing a quality approximate median of strings based on edit operations [J].
Abreu, J. ;
Rico-Juan, J. R. .
PATTERN RECOGNITION LETTERS, 2014, 36 :74-80
[2]  
Allauzen C., 2009, CORR
[3]  
[Anonymous], 1997, Statistical methods for speech recognition
[4]  
Balle B, 2013, THESIS
[5]  
Becerra-Bonache L, 2008, J MACH LEARN RES, V9, P1841
[6]  
Berstel Jr Jean, 1988, Rational Series and Their Languages, EATCS Monographs on Theoretical Computer Science
[7]   Computing the Expected Edit Distance from a String to a PFA [J].
Calvo-Zaragoza, Jorge ;
de la Higuera, Colin ;
Oncina, Jose .
Implementation and Application of Automata, 2016, 9705 :39-50
[8]   Topology of strings: Median string is NP-complete [J].
de la Higuera, C ;
Casacuberta, F .
THEORETICAL COMPUTER SCIENCE, 2000, 230 (1-2) :39-48
[9]  
de la Higuera C., 2013, P FSMNLP
[10]   The most probable string: an algorithmic study [J].
De la Higuera, Colin ;
Oncina, Jose .
JOURNAL OF LOGIC AND COMPUTATION, 2014, 24 (02) :311-330