Efficient segmentation-free keyword spotting in historical document collections

被引:77
|
作者
Rusinol, Marcal [1 ]
Aldavert, David [1 ]
Toledo, Ricardo [1 ]
Llados, Josep [1 ]
机构
[1] Univ Autonoma Barcelona, Dept Ciencies Comp, Comp Vis Ctr, E-08193 Barcelona, Spain
关键词
Historical documents; Keyword spotting; Segmentation-free; Dense SIFT features; Latent semantic analysis; Product quantization; WORD RETRIEVAL; TEXT LINE;
D O I
10.1016/j.patcog.2014.08.021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-by-example paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the latent semantic analysis technique and compressing the descriptors with the product quantization method, we are able to efficiently index the document information both in terms of memory and time. The proposed method is evaluated using four different collections of historical documents achieving good performances on both handwritten and typewritten scenarios. The yielded performances outperform the recent state-of-the-art keyword spotting approaches. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:545 / 555
页数:11
相关论文
共 50 条
  • [1] Segmentation-free pattern spotting in historical document images
    En, Sovann
    Petitjean, Caroline
    Nicolas, Stephane
    Heutte, Laurent
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 606 - 610
  • [2] Segmentation-Free Keyword Retrieval in Historical Document Images
    Rabaev, Irina
    Dinstein, Itshak
    El-Sana, Jihad
    Kedem, Klara
    IMAGE ANALYSIS AND RECOGNITION, ICIAR 2014, PT I, 2014, 8814 : 369 - 378
  • [3] Browsing Heterogeneous Document Collections by a Segmentation-free Word Spotting Method
    Rusinol, Marcal
    Aldavert, David
    Toledo, Ricardo
    Llados, Josep
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 63 - 67
  • [4] A segmentation-free word spotting method for historical printed documents
    Konidaris, Thomas
    Kesidis, Anastasios L.
    Gatos, Basilis
    PATTERN ANALYSIS AND APPLICATIONS, 2016, 19 (04) : 963 - 976
  • [5] A segmentation-free word spotting method for historical printed documents
    Thomas Konidaris
    Anastasios L. Kesidis
    Basilis Gatos
    Pattern Analysis and Applications, 2016, 19 : 963 - 976
  • [6] Segmentation-free word spotting in historical Bangla handwritten document using Wave Kernel Signature
    Das, Sugata
    Mandal, Sekhar
    PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (02) : 593 - 610
  • [7] Segmentation-free word spotting in historical Bangla handwritten document using Wave Kernel Signature
    Sugata Das
    Sekhar Mandal
    Pattern Analysis and Applications, 2020, 23 : 593 - 610
  • [8] Segmentation-free word spotting with exemplar SVMs
    Almazan, Jon
    Gordo, Albert
    Fornes, Alicia
    Valveny, Ernest
    PATTERN RECOGNITION, 2014, 47 (12) : 3967 - 3978
  • [9] Efficient Learning-Free Keyword Spotting
    Retsinas, George
    Louloudis, Georgios
    Stamatopoulos, Nikolaos
    Gatos, Basilis
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (07) : 1587 - 1600
  • [10] Line Segmentation Free Probabilistic Keyword Spotting and Indexing
    Barrere, Killian
    Toselli, Alejandro H.
    Vidal, Enrique
    PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2019, PT II, 2019, 11868 : 201 - 213