A survey of document image word spotting techniques

被引:83
作者
Giotis, Angelos P. [1 ,2 ]
Sfikas, Giorgos [2 ]
Gatos, Basilis [2 ]
Nikou, Christophoros [1 ]
机构
[1] Univ Ioannina, Dept Comp Sci & Engn, Ioannina, Greece
[2] Natl Ctr Sci Res Demokritos, Computat Intelligence Lab, Inst Informat & Telecommun, GR-15310 Athens, Greece
关键词
Word spotting; Retrieval; Document indexing; Features; Representation; Relevance feedback; HIDDEN MARKOV-MODELS; HANDWRITTEN DOCUMENTS; TEXT LINE; SEGMENTATION; RETRIEVAL; RECOGNITION; CHARACTER; ONLINE; EXTRACTION; SIMILARITY;
D O I
10.1016/j.patcog.2017.02.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vast collections of documents available in image format need to be indexed for information retrieval purposes. In this framework, word spotting is an alternative solution to optical character recognition (OCR), which is rather inefficient for recognizing text of degraded quality and unknown fonts usually appearing in printed text, or writing style variations in handwritten documents. Over the past decade there has been a growing interest in addressing document indexing using word spotting which is reflected by the continuously increasing number of approaches. However, there exist very few comprehensive studies which analyze the various aspects of a word spotting system. This work aims to review the recent approaches as well as fill the gaps in several topics with respect to the related works. The nature of texts and inherent challenges addressed by word spotting methods are thoroughly examined. After presenting the core steps which compose a word spotting system, we investigate the use of retrieval enhancement techniques based on relevance feedback which improve the retrieved results. Finally, we present the datasets which are widely used for word spotting, we describe the evaluation standards and measures applied for performance assessment and discuss the results achieved by the state of the art. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:310 / 332
页数:23
相关论文
共 190 条
  • [1] Word Spotting based Retrieval of Urdu Handwritten Documents
    Abidi, Ali
    Jamil, Akhtar
    Siddiqi, Imran
    Khurshid, Khurram
    [J]. 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 331 - 336
  • [2] Towards Searchable Digital Urdu Libraries - A Word Spotting Based Retrieval Approach
    Abidi, Ali
    Siddiqi, Imran
    Khurshid, Khurram
    [J]. 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1344 - 1348
  • [3] Word matching using single closed contours for indexing handwritten historical documents
    Adamek, Tornasz
    O'Connor, Noel E.
    Smeaton, Alan F.
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 9 (2-4) : 153 - 165
  • [4] Ahmad A. R., 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P161, DOI 10.1109/ICDAR.2009.248
  • [5] Face description with local binary patterns:: Application to face recognition
    Ahonen, Timo
    Hadid, Abdenour
    Pietikainen, Matti
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (12) : 2037 - 2041
  • [6] A study of Bag-of-Visual-Words representations for handwritten keyword spotting
    Aldavert, David
    Rusinol, Marcal
    Toledo, Ricardo
    Llados, Josep
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2015, 18 (03) : 223 - 234
  • [7] Integrating Visual and Textual Cues for Query-by-String Word Spotting
    Aldavert, David
    Rusinol, Marcal
    Toledo, Ricardo
    Llados, Josep
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 511 - 515
  • [8] Efficient Exemplar Word Spotting
    Almazan, Jon
    Gordo, Albert
    Fornes, Alicia
    Valveny, Ernest
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012, 2012,
  • [9] Word Spotting and Recognition with Embedded Attributes
    Almazan, Jon
    Gordo, Albert
    Fornes, Alicia
    Valveny, Ernest
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (12) : 2552 - 2566
  • [10] Segmentation-free word spotting with exemplar SVMs
    Almazan, Jon
    Gordo, Albert
    Fornes, Alicia
    Valveny, Ernest
    [J]. PATTERN RECOGNITION, 2014, 47 (12) : 3967 - 3978