Word Spotting in Cursive Handwritten Documents Using Modified Character Shape Codes

被引:0
作者
Sarkar, Sayantan [1 ]
机构
[1] NIT Rourkela, Dept Elect Engn, Rourkela, Orissa, India
来源
ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 3 | 2013年 / 178卷
关键词
Word Spot; Handwritten Documents; Character Shape Code; Word Shape Token; Modified Character Shape Code; Levenshtein Distance; Query Search; Segmentation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a large collection of Handwritten English paper documents of Historical and Scientific importance. But paper documents are not recognised directly by computer. Hence the closest way of indexing these documents is by storing their document digital image. Hence a large database of document images can replace the paper documents. But the document and data corresponding to each image cannot be directly recognised by the computer. This paper applies the technique of word spotting using Modified Character Shape Code to Handwritten English document images for quick and efficient query search of words on a database of document images. It is different from other Word Spotting techniques as it implements two level of selection for word segments to match search query. First based on word size and then based on character shape code of query. It makes the process faster and more efficient and reduces the need of multiple pre-processing.
引用
收藏
页码:269 / 278
页数:10
相关论文
共 9 条
[1]  
Burl M.C, USING HIERARCHICAL S
[2]  
Casey R.G, 1996, IEEE T PATTER ANAL M
[3]  
Lawrence Spitz A, 1999, INT J DOCUMENT ANAL
[4]  
Lawrence Spitz A., 1995, SHAPE STRUCTURE PATT
[5]  
Manmatha R, COMPUTER SCI DEP FAC, V203
[6]  
Manmatha R, INDEXING HANDWRITTEN
[7]  
Marcolino A, 2000, P 5 IB AM S PATT REC
[8]  
Marti U., 2002, INT J DOCUMENT ANAL
[9]  
Nakayama T, Patent, Patent No. 5526443