A combined strategy of analysis for the localization of heterogeneous form fields in ancient pre-printed records

被引:0
|
作者
Aurélie Lemaitre
Jean Camillerapp
Cérès Carton
Bertrand Coüasnon
机构
[1] Univ Rennes - CNRS - IRISA,
来源
International Journal on Document Analysis and Recognition (IJDAR) | 2018年 / 21卷
关键词
Historical documents; Field localization; Heterogeneous layout; Rule-based system; Word spotting; Unsupervised clustering;
D O I
暂无
中图分类号
学科分类号
摘要
This paper deals with the location of handwritten fields in old pre-printed registers. The images present the difficulties of old and damaged documents, and we also have to face the difficulty of extracting the text due to the great interaction between handwritten and printed writing. In addition, in many collections, the structure of the forms varies according to the origin of the documents. This work is applied to a database of Mexican marriage records, which has been published for a competition in the workshop HIP 2013 and is publicly available. In this paper, we show the interest and limitations of the empirical method which has been submitted for the competition. We then present a method that combines a logical description of the contents of the documents, with the result of an automatic analysis on the physical properties of the collection. The particularity of this analysis is that it does not require any ground-truth. We show that this combined strategy can locate 97.2% of handwritten fields. The proposed approach is generalizable and could be applied to other databases.
引用
收藏
页码:269 / 282
页数:13
相关论文
共 1 条