Bounding Edit Distance for similarity-based sequence classification on Structural Pattern Recognition

被引:4
作者
Rico-Juan J.R. [1 ]
Valero-Mas J.J. [1 ]
Iñesta J.M. [1 ]
机构
[1] Department of Software and Computing Systems, University of Alicante, Carretera San Vicente del Raspeig s/n, Alicante
来源
Applied Soft Computing Journal | 2020年 / 97卷
关键词
Classification; Edit distance; Efficient search; Nearest neighbor; Structural pattern recognition;
D O I
10.1016/j.asoc.2020.106778
中图分类号
学科分类号
摘要
Pattern Recognition tasks in the structural domain generally exhibit high accuracy results, but their time efficiency is quite low. Furthermore, this low performance is more pronounced when dealing with instance-based classifiers, since, for each query, the entire corpus must be evaluated to find the closest prototype. In this work we address this efficiency issue for the Nearest Neighbor classifier when data are encoded as two-dimensional code sequences, and more precisely strings and sequences of vectors. For this, a set of bounds is proposed in the distance metric that avoid the calculation of unnecessary distances. Results obtained prove the effectiveness of the proposal as it reduces the classification time in percentages between 80% and 90% for string representations and between 60% and 80% for data codified as sequences of vectors with respect to their corresponding non-optimized version of the classifier. © 2020 Elsevier B.V.
引用
收藏
相关论文
共 29 条
  • [1] Duda R.O., Hart P.E., Stork D.G., Pattern Classification, (2012)
  • [2] Calvo-Zaragoza J., Valero-Mas J.J., Rico-Juan J.R., Improving kNN multi-label classification in prototype selection scenarios using class proposals, Pattern Recognit., 48, 5, pp. 1608-1622, (2015)
  • [3] Calvo-Zaragoza J., Valero-Mas J.J., Rico-Juan J.R., Prototype generation on structural data using dissimilarity space representation, Neural Comput. Appl., 28, 9, pp. 2415-2424, (2017)
  • [4] Bunke H., Riesen K., Towards the unification of structural and statistical pattern recognition, Pattern Recognit. Lett., 33, 7, pp. 811-825, (2012)
  • [5] Mitchell T.M., Machine Learning, (1997)
  • [6] Prasath V., Alfeilat H.A.A., Lasassmeh O., Hassanat A., Tarawneh A.S., Distance and similarity measures effect on the performance of K-nearest neighbor classifier–A review, (2017)
  • [7] Levenshtein V.I., Binary codes capable of correcting deletions, insertions, and reversals, Sov. Phys. Dokl., 10, 8, pp. 707-710, (1966)
  • [8] Charikar M., Geri O., Kim M.P., Kuszmaul W., On estimating edit distance: Alignment, dimension reduction, and embeddings, 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, pp. 1-14, (2018)
  • [9] Valero-Mas J.J., Castellanos F.J., Data reduction in the string space for efficient kNN classification through space partitioning, Appl. Sci., 10, 10, (2020)
  • [10] Riesen K., Schmidt R., Online signature verification based on string edit distance, Int. J. Doc. Anal. Recognit., 22, 1, pp. 41-54, (2019)