Features for word spotting in historical manuscripts

被引:0
作者
Rath, TM [1 ]
Manmatha, R [1 ]
机构
[1] Univ Massachusetts, Ctr Intelligent Informat Retrieval, Multi Media Indexing & Retrieval Grp, Amherst, MA 01002 USA
来源
SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS | 2003年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For the transition from traditional to digital libraries, the large number of handwritten manuscripts that exist pose a great challenge. Easy access to such collections requires an index, which is currently created manually at great cost. Because automatic handwriting recognizers fail on historical manuscripts, the word spotting technique has been developed: the words in a collection are matched as images and grouped into clusters which contain all instances of the same word. By annotating "interesting" clusters, an index that links words to the locations where they occur can be built automatically. Due to the noise in historical documents, selecting the right features for matching words is crucial. We analyzed a range of features suitable for matching words using dynamic time warping (DTW), which aligns and compares sets of features extracted from two images. Each feature's individual performance was measured on a test set. With arc average precision of 72%, a combination of features outperforms competing techniques in speed and precision.
引用
收藏
页码:218 / 222
页数:5
相关论文
共 8 条
[1]  
CHEN CH, 1995, P 3 INT C DOC AN REC, P919
[2]  
KOLEZ A, 2000, PATTERN ANAL APPL, P153
[3]  
Manmatha R, 1999, LECT NOTES COMPUT SC, V1682, P22
[4]  
MANMATHA R, 1997, INTELLIGENT MULTIMED
[5]  
MANMATHA R, 1996, DIGITAL LIB 96, P151
[6]  
RATH TM, 2003, IN PRESS P COMP VIS
[7]   Transcript mapping for historic handwritten document images [J].
Tomai, CI ;
Zhang, B ;
Govindaraju, V .
EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, :413-418
[8]  
[No title captured]