A segmentation-free approach for keyword search in historical typewritten documents

被引:22
作者
Gatos, B [1 ]
Konidaris, T [1 ]
Ntzios, K [1 ]
Pratikakis, I [1 ]
Perantonis, SJ [1 ]
机构
[1] Natl Ctr Sci Res Demokritos, Computat Intelligence Lab, Inst Informat & Telecommun, GR-15310 Athens, Greece
来源
EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS | 2005年
关键词
D O I
10.1109/ICDAR.2005.30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel segmentation-free approach for keyword search in historical typewritten documents combining image preprocessing, synthetic data creation, word spotting and user's feedback technologies. Our aim is to search for keywords typed by the user in a large collection of digitized typewritten historical documents. The proposed method is based on: (i) image preprocessing for image binarization and enhancement, noisy border and frame removal, orientation and skew correction; (ii) creation of synthetic image words from keywords typed by the user; (iii) word segmentation using dynamic parameters; (iv) efficient feature extraction for each image word and (v) a retrieval procedure that is optimized by user's feedback. Experimental results prove the efficiency of the proposed approach.
引用
收藏
页码:54 / 58
页数:5
相关论文
共 15 条
[1]  
[Anonymous], 2004, Proc. of the 2004 ACM Symposium on Applied Computing (SAC '04)
[2]   OMNIDOCUMENT TECHNOLOGIES [J].
BOKSER, M .
PROCEEDINGS OF THE IEEE, 1992, 80 (07) :1066-1078
[3]  
Doermann D, 1997, PROC INT CONF DOC, P314, DOI 10.1109/ICDAR.1997.619863
[4]   A binary-tree-based OCR technique for machine-printed characters [J].
Gatos, B ;
Papamarkos, N ;
Chamzas, C .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 1997, 10 (04) :403-412
[5]  
Gatos B., 2000, International Journal on Digital Libraries, V3, P77
[6]  
Gatos B, 2004, LECT NOTES COMPUT SC, V3163, P102
[7]  
Guillevic D, 1997, PROC INT CONF DOC, P544, DOI 10.1109/ICDAR.1997.620559
[8]   A SYSTEM FOR INTERPRETATION OF LINE DRAWINGS [J].
KASTURI, R ;
BOW, ST ;
ELMASRI, W ;
SHAH, J ;
GATTIKER, JR ;
MOKATE, UB .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (10) :978-992
[9]  
LU Y, 2001, P 6 INT C DOC AN REC, P10
[10]  
Manmatha R., 1997, INTELLIGENT MULTIMED, P43