Keyword spotting for self-training of BLSTM NN based handwriting recognition systems

被引:24
作者
Frinken, Volkmar [1 ]
Fischer, Andreas [2 ]
Baumgartner, Markus [3 ]
Bunke, Horst [3 ]
机构
[1] Univ Autonoma Barcelona, Comp Vis Ctr, E-08193 Bellaterra, Barcelona, Spain
[2] Concordia Univ, Ctr Pattern Recognit & Machine Intelligence, Montreal, PQ H3G 1M8, Canada
[3] Univ Bern, Inst Comp Sci & Appl Math, CH-3012 Bern, Switzerland
基金
瑞士国家科学基金会;
关键词
Document retrieval; Keyword spotting; Handwriting recognition; Neural networks; Semi-supervised learning; PERFORMANCE; IMPROVE; MODEL;
D O I
10.1016/j.patcog.2013.06.030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The automatic transcription of unconstrained continuous handwritten text requires well trained recognition systems. The semi-supervised paradigm introduces the concept of not only using labeled data but also unlabeled data in the learning process. Unlabeled data can be gathered at little or not cost. Hence it has the potential to reduce the need for labeling training data, a tedious and costly process. Given a weak initial recognizer trained on labeled data, self-training can be used to recognize unlabeled data and add words that were recognized with high confidence to the training set for re-training. This process is not trivial and requires great care as far as selecting the elements that are to be added to the training set is concerned. In this paper, we propose to use a bidirectional long short-term memory neural network handwritten recognition system for keyword spotting in order to select new elements. A set of experiments shows the high potential of self-training for bootstrapping handwriting recognition systems, both for modem and historical handwritings, and demonstrate the benefits of using keyword spotting over previously published self-training schemes. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1073 / 1082
页数:10
相关论文
共 43 条
[11]   Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models [J].
Espana-Boquera, Salvador ;
Jose Castro-Bleda, Maria ;
Gorbe-Moya, Jorge ;
Zamora-Martinez, Francisco .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (04) :767-779
[12]  
Fischer A., 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P3416, DOI 10.1109/ICPR.2010.834
[13]  
Fischer A., LEXICON FREE HANDWRI, DOI DOI 10.1016/J.PATREC.2011.09.009
[14]   Automatic Transcription of Handwritten Medieval Documents [J].
Fischer, Andreas ;
Wuethrich, Markus ;
Liwicki, Marcus ;
Frinken, Volkmar ;
Bunke, Horst ;
Viehhauser, Gabriel ;
Stolz, Michael .
2009 15TH INTERNATIONAL CONFERENCE ON VIRTUAL SYSTEMS AND MULTIMEDIA PROCEEDINGS (VSMM 2009), 2009, :137-+
[15]   LEARNING TO RECOGNIZE PATTERNS WITHOUT A TEACHER [J].
FRALICK, SC .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :57-+
[16]  
Frinken V., 2009, LNCS, P291
[17]   Semi-Supervised Learning for Cursive Handwriting Recognition using Keyword Spotting [J].
Frinken, Volkmar ;
Baumgartner, Markus ;
Fischer, Andreas ;
Bunke, Horst .
13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, :49-54
[18]   A Novel Word Spotting Method Based on Recurrent Neural Networks [J].
Frinken, Volkmar ;
Fischer, Andreas ;
Manmatha, R. ;
Bunke, Horst .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (02) :211-224
[19]   Co-Training for Handwritten Word Recognition [J].
Frinken, Volkmar ;
Fischer, Andreas ;
Bunke, Horst ;
Fornes, Alicia .
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, :314-318
[20]  
Frinken V, 2010, LECT NOTES COMPUT SC, V6419, P104