Named Entity Recognition from Unstructured Handwritten Document Images

被引:11
作者
Adak, Chandranath [1 ]
Chaudhuri, Bidyut B. [2 ]
Blumenstein, Michael [1 ]
机构
[1] Griffith Univ, Sch ICT, Gold Coast 4222, Australia
[2] Indian Stat Inst, CVPR Unit, Kolkata 700108, India
来源
PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016) | 2016年
关键词
BLSTM neural network; Document image analysis; Dual layer bagging; Information retrieval; Named entity recognition;
D O I
10.1109/DAS.2016.15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named entity recognition is an important topic in the field of natural language processing, whereas in document image processing, such recognition is quite challenging without employing any linguistic knowledge. In this paper we propose an approach to detect named entities (NEs) directly from offline handwritten unstructured document images without explicit character/word recognition, and with very little aid from natural language and script rules. At the preprocessing stage, the document image is binarized, and then the text is segmented into words. The slant/skew/baseline corrections of the words are also performed. After preprocessing, the words are sent for NE recognition. We analyze the structural and positional characteristics of NEs and extract some relevant features from the word image. Then the BLSTM neural network is used for NE recognition. Our system also contains a post-processing stage to reduce the true NE rejection rate. The proposed approach produces encouraging results on both historical and modern document images, including those from an Australian archive, which are reported here for the very first time.
引用
收藏
页码:375 / 380
页数:6
相关论文
共 17 条
[1]  
[Anonymous], 1971, The SMART Retrieval System-Experiments in Automatic Document Processing
[2]  
[Anonymous], HIDDEN MARKOV MODELS
[3]  
Finkel J. R., 2005, P 43 ANN M ASS COMP, P363, DOI DOI 10.3115/1219840.1219885
[4]  
Frinken V., 2010, Proceedings 2010 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), P352, DOI 10.1109/ICFHR.2010.61
[5]   A Novel Connectionist System for Unconstrained Handwriting Recognition [J].
Graves, Alex ;
Liwicki, Marcus ;
Fernandez, Santiago ;
Bertolami, Roman ;
Bunke, Horst ;
Schmidhuber, Juergen .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (05) :855-868
[6]   BLSTM Neural Network based Word Retrieval for Hindi Documents [J].
Jain, Raman ;
Frinken, Volkmar ;
Jawahar, C. V. ;
Manmatha, R. .
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, :83-87
[7]   Slant estimation algorithm for OCR systems [J].
Kavallieratou, E ;
Fakotakis, N ;
Kokkinakis, G .
PATTERN RECOGNITION, 2001, 34 (12) :2515-2522
[8]   On combining classifiers [J].
Kittler, J ;
Hatef, M ;
Duin, RPW ;
Matas, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (03) :226-239
[9]   The IAM-database: An English sentence database for offline handwriting recognition [J].
U.-V. Marti ;
H. Bunke .
International Journal on Document Analysis and Recognition, 2002, 5 (1) :39-46
[10]  
Nadeau D, 2007, LINGUIST INVESTIG, V30, P3