HMM word graph based keyword spotting in handwritten document images

被引:41
|
作者
Toselli, Alejandro Hector [1 ]
Vidal, Enrique [1 ]
Romero, Veronica [1 ]
Frinken, Volkmar [2 ,3 ,4 ]
机构
[1] Univ Politecn Valencia, Camino Vera S-N, E-46022 Valencia, Spain
[2] Kyushu Univ, Fac Informat Sci & Elect Engn, Fukuoka 812, Japan
[3] Univ Calif Davis, Elect & Comp Engn, Davis, CA 95616 USA
[4] ONU Technol Inc, San Jose, CA USA
基金
欧盟地平线“2020”;
关键词
Keyword spotting; Handwritten text recognition; Word graph; Posterior probability; Confidence score; INTERACTIVE TRANSCRIPTION; HISTORICAL DOCUMENTS; CONFIDENCE MEASURES; SEGMENTATION; RECOGNITION; ALGORITHM; FILLER; MODEL;
D O I
10.1016/j.ins.2016.07.063
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Line-level keyword spotting (KWS) is presented on the basis of frame-level word posterior probabilities. These posteriors are obtained using word graphs derived from the recognition process of a full-fledged handwritten text recognizer based on hidden Markov models and N-gram language models. This approach has several advantages. First, since it uses a holistic, segmentation-free technology, it does not require any kind of word or character segmentation. Second, the use of language models allows the context of each spotted word to be taken into account, thereby considerably increasing KWS accuracy. And third, the proposed KWS scores are based on true posterior probabilities, taking into account all (or most) possible word segmentations of the input image. These scores are properly bounded and normalized. This mathematically clean formulation lends itself to smooth, threshold-based keyword queries which, in turn, permit comfortable trade-offs between search precision and recall. Experiments are carried out on several historic collections of handwritten text images, as well as a well-known data set of modern English handwritten text. According to the empirical results, the proposed approach achieves KWS results comparable to those obtained with the recently-introduced "BLSTM neural networks KWS" approach and clearly outperform the popular, state-of-the-art "Filler HMM" KWS method. Overall, the results clearly support all the above-claimed advantages of the proposed approach. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:497 / 518
页数:22
相关论文
共 50 条
  • [31] Improving HMM-Based Keyword Spotting with Character Language Models
    Fischer, Andreas
    Frinken, Volkmar
    Bunke, Horst
    Suen, Ching Y.
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 506 - 510
  • [32] Graph Based Keyword Spotting in Medieval Slavic Documents - A Project Outline
    Riesen, Kaspar
    Brodic, Darko
    Milivojevic, Zoran N.
    Maluckov, Cedomir A.
    DIGITAL HERITAGE: PROGRESS IN CULTURAL HERITAGE: DOCUMENTATION, PRESERVATION, AND PROTECTION, 2014, 8740 : 724 - 731
  • [33] Keyword spotting for multimedia document indexing
    Gelin, P
    Wellekens, CJ
    MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS II, 1997, 3229 : 366 - 377
  • [34] Segmentation-free word spotting in historical Bangla handwritten document using Wave Kernel Signature
    Das, Sugata
    Mandal, Sekhar
    PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (02) : 593 - 610
  • [35] Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage
    Cheikhrouhou, Ahmed
    Kessentini, Yousri
    Kanoun, Slim
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13) : 9201 - 9215
  • [36] Z-Transform-Based Profile Matching to Develop a Learning-Free Keyword Spotting Method for Handwritten Document Images (vol 15, 93, 2022)
    Banerjee, Debanshu
    Bhowal, Pratik
    Malakar, Samir
    Cuevas, Erik
    Perez-Cisneros, Marco
    Sarkar, Ram
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2022, 15 (01)
  • [37] A Multiple Instances Approach to Improving Keyword Spotting on Historical Mongolian Document Images
    Wei, Hongxi
    Gao, Guanglai
    Su, Xiangdong
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 121 - 125
  • [38] Lexicon-free handwritten word spotting using character HMMs
    Fischer, Andreas
    Keller, Andreas
    Frinken, Volkmar
    Bunke, Horst
    PATTERN RECOGNITION LETTERS, 2012, 33 (07) : 934 - 942
  • [39] An overview on handwritten documents word spotting
    Boualam, Manal
    Khaissidi, Ghizlane
    Mrabti, Mostafa
    Elfakir, Youssef
    2019 INTERNATIONAL CONFERENCE ON WIRELESS TECHNOLOGIES, EMBEDDED AND INTELLIGENT SYSTEMS (WITS), 2019,
  • [40] A study of Bag-of-Visual-Words representations for handwritten keyword spotting
    Aldavert, David
    Rusinol, Marcal
    Toledo, Ricardo
    Llados, Josep
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2015, 18 (03) : 223 - 234