Flexible Sequence Matching Technique: Application to Word Spotting in Degraded Documents

被引:7
作者
Mondal, Tanmoy [1 ]
Ragot, Nicolas [1 ]
Ramel, Jean-Yves [1 ]
Pal, Umapada [2 ]
机构
[1] Univ Tours, Lab Informat, Tours, France
[2] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata, India
来源
2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR) | 2014年
关键词
Elastic matching; Minimal variance matching (MVM); Dynamic time warping (DTW); Continuous dynamic programming (CDP); Word spotting; Handwritten documents; Degraded historical document; Sequence alignment; SHAPE;
D O I
10.1109/ICFHR.2014.43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a new sequence-matching algorithm, called as Flexible Sequence Matching (FSM) algorithm is proposed. FSM combines several abilities of other sequence matching algorithms (especially DTW, CDP and MVM) that could be configured depending on the application domain. Its generality and robustness comes from its ability to find subsequences (as in CDP), to skip outliers inside the match sequences (as in MVM) and to match multiple elements with a single one (as in CDP and DTW). These properties make it extremely suitable for robust word spotting. More precisely, the FSM algorithm has the capability to retrieve a query inside a line or piece of line. This facility is useful as word segmentation process may not work accurately or when only line segmentation information is available. Furthermore, thanks to its skipping capability, that makes the proposed FSM algorithm less sensible to local variations in the spelling of words, and also to local degradation effects. Finally, its multiple matching facilities (many to one and one to many matching) are useful in case of different length of target and query sequences due to the variability in scale factor. We demonstrate the superiority of proposed FSM algorithm in specific cases such as incorrect word segmentation and word level local variations. When different experiments were performed using handwritten George Washington dataset and also on historical typewritten document images, quite promising results were obtained.
引用
收藏
页码:210 / 215
页数:6
相关论文
共 17 条
[1]  
Albrecht T., 2009, DYNAMIC TIME WARPING, P231
[2]  
[Anonymous], PREDICTIVE CONTROL P
[3]  
Fischer A., 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P3416, DOI 10.1109/ICPR.2010.834
[4]  
Frinken V, 2010, LECT NOTES ARTIF INT, V5998, P185, DOI 10.1007/978-3-642-12159-3_17
[5]  
Gatos Basilis, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P271, DOI 10.1109/ICDAR.2009.236
[6]   Word spotting in historical printed documents using shape and sequence comparisons [J].
Khurshid, Khurram ;
Faure, Claudie ;
Vincent, Nicole .
PATTERN RECOGNITION, 2012, 45 (07) :2598-2609
[7]   Optimal subsequence bijection [J].
Latecki, Longin Jan ;
Wang, Qiang ;
Koknar-Tezel, Suzan ;
Megalooikonomou, Vasileios .
ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, :565-570
[8]   An elastic partial shape matching technique [J].
Latecki, Longin Jan ;
Megalooikonomou, Vasileios ;
Wang, Qiang ;
Yu, Deguang .
PATTERN RECOGNITION, 2007, 40 (11) :3069-3080
[9]   Towards an omnilingual word retrieval system for ancient manuscripts [J].
Leydier, Yann ;
Ouji, Asma ;
LeBourgeois, Frank ;
Emptoz, Hubert .
PATTERN RECOGNITION, 2009, 42 (09) :2089-2105
[10]   Spotting method for classification of real world data [J].
Oka, R .
COMPUTER JOURNAL, 1998, 41 (08) :559-565