A faster and more accurate heuristic for cyclic edit distance computation

被引:12
作者
Ayad, Lorraine A. K. [1 ]
Barton, Carl [2 ]
Pissis, Solon P. [1 ]
机构
[1] Kings Coll London, Dept Informat, London WC2R 2LS, England
[2] European Bioinformat Inst, Wellcome Trust Genome Campus, Hinxton CB10 1SD, England
基金
英国工程与自然科学研究理事会;
关键词
Cyclic edit distance; Circular sequences; Cyclic strings; Chain code; q-gram distance; APPROXIMATE; SEQUENCE; ALGORITHMS;
D O I
10.1016/j.patrec.2017.01.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequence comparison is the core computation of many applications involving textual representations of data. Edit distance is the most widely used measure to quantify the similarity of two sequences. Edit distance can be defined as the minimal total cost of a sequence of edit operations to transform one sequence into the other; for a sequence x of length m and a sequence y of length n, it can be computed in time O(mn). In many applications, it is common to consider sequences with circular structure: for instance, the orientation of two images or the leftmost position of two linearised circular DNA sequences may be irrelevant. To this end, an algorithm to compute the cyclic edit distance in time 0(mn log m) was proposed (Maes, 2003 [18]) and several heuristics have been proposed to speed up this computation. Recently, a new algorithm based on q-grams was proposed for circular sequence comparison (Grossi et al., 2016 [13]). We extend this algorithm for cyclic edit distance computation and show that this new heuristic is faster and more accurate than the state of the art. The aim of this letter is to give visibility to this idea in the pattern recognition community. (C) 2017 The Authors. Published by Elsevier B.V.
引用
收藏
页码:81 / 87
页数:7
相关论文
共 29 条
[1]  
[Anonymous], TECHNICAL REPORT
[2]  
[Anonymous], 1998, MNIST DATABASE HANDW
[3]  
[Anonymous], INF PROCESS LETT
[4]  
Barton Carl, 2015, Language and Automata Theory and Applications. 9th International Conference, LATA 2015. Proceedings: LNCS 8977, P85, DOI 10.1007/978-3-319-15579-1_6
[5]  
Barton C., 2015, MATH STRUCT COMP SCI, P1
[6]   Accurate and Efficient Methods to Improve Multiple Circular Sequence Alignment [J].
Barton, Carl ;
Iliopoulos, Costas S. ;
Kundu, Ritu ;
Pissis, Solon P. ;
Retha, Ahmad ;
Vayani, Fatima .
EXPERIMENTAL ALGORITHMS, SEA 2015, 2015, 9125 :247-258
[7]   Fast algorithms for approximate circular string matching [J].
Barton, Carl ;
Iliopoulos, Costas S. ;
Pissis, Solon P. .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2014, 9
[8]   APPLICATIONS OF APPROXIMATE STRING-MATCHING TO 2D SHAPE-RECOGNITION [J].
BUNKE, H ;
BUHLER, U .
PATTERN RECOGNITION, 1993, 26 (12) :1797-1812
[9]   Thematic Minireview Series on Circular Proteins [J].
Craik, David J. ;
Allewell, Norma M. .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2012, 287 (32) :26999-27000
[10]  
Crochemore M., 2007, Algorithms on Strings, DOI DOI 10.1017/CBO9780511546853