Efficient segmentation-free keyword spotting in historical document collections

被引:78
作者
Rusinol, Marcal [1 ]
Aldavert, David [1 ]
Toledo, Ricardo [1 ]
Llados, Josep [1 ]
机构
[1] Univ Autonoma Barcelona, Dept Ciencies Comp, Comp Vis Ctr, E-08193 Barcelona, Spain
关键词
Historical documents; Keyword spotting; Segmentation-free; Dense SIFT features; Latent semantic analysis; Product quantization; WORD RETRIEVAL; TEXT LINE;
D O I
10.1016/j.patcog.2014.08.021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-by-example paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the latent semantic analysis technique and compressing the descriptors with the product quantization method, we are able to efficiently index the document information both in terms of memory and time. The proposed method is evaluated using four different collections of historical documents achieving good performances on both handwritten and typewritten scenarios. The yielded performances outperform the recent state-of-the-art keyword spotting approaches. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:545 / 555
页数:11
相关论文
共 32 条
[1]   Integrating Visual and Textual Cues for Query-by-String Word Spotting [J].
Aldavert, David ;
Rusinol, Marcal ;
Toledo, Ricardo ;
Llados, Josep .
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :511-515
[2]   A Coarse-to-Fine Approach for Handwritten Word Spotting in Large Scale Historical Documents Collection [J].
Almazan, J. ;
Fernandez, D. ;
Fornes, A. ;
Llados, J. ;
Valveny, E. .
13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, :455-460
[3]   Efficient Exemplar Word Spotting [J].
Almazan, Jon ;
Gordo, Albert ;
Fornes, Alicia ;
Valveny, Ernest .
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
[4]   Segmentation-free word spotting with exemplar SVMs [J].
Almazan, Jon ;
Gordo, Albert ;
Fornes, Alicia ;
Valveny, Ernest .
PATTERN RECOGNITION, 2014, 47 (12) :3967-3978
[5]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[6]  
2-9
[7]   Lexicon-free handwritten word spotting using character HMMs [J].
Fischer, Andreas ;
Keller, Andreas ;
Frinken, Volkmar ;
Bunke, Horst .
PATTERN RECOGNITION LETTERS, 2012, 33 (07) :934-942
[8]   A Novel Word Spotting Method Based on Recurrent Neural Networks [J].
Frinken, Volkmar ;
Fischer, Andreas ;
Manmatha, R. ;
Bunke, Horst .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (02) :211-224
[9]  
Fulkerson B, 2008, LECT NOTES COMPUT SC, V5302, P179, DOI 10.1007/978-3-540-88682-2_15
[10]  
Gatos Basilis, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P271, DOI 10.1109/ICDAR.2009.236