Consensus-based clustering for document image segmentation

被引:0
作者
Soumyadeep Dey
Jayanta Mukherjee
Shamik Sural
机构
[1] Indian Institute of Technology Kharagpur,Department of Computer Science and Engineering
来源
International Journal on Document Analysis and Recognition (IJDAR) | 2016年 / 19卷
关键词
Document analysis; Segmentation; Clustering; Hypothesis testing; Stroke width;
D O I
暂无
中图分类号
学科分类号
摘要
Segmentation of a document image plays an important role in automatic document processing. In this paper, we propose a consensus-based clustering approach for document image segmentation. In this method, the foreground regions of a document image are grouped into a set of primitive blocks, and a set of features is extracted from them. Similarities among the blocks are computed on each feature using a hypothesis test-based similarity measure. Based on the consensus of these similarities, clustering is performed on the primitive blocks. This clustering approach is used iteratively with a classifier to label each primitive block. Experimental results show the effectiveness of the proposed method. It is further shown in the experimental results that the dependency of classification performance on the training data is significantly reduced.
引用
收藏
页码:351 / 368
页数:17
相关论文
共 54 条
[1]  
Bloomberg DS(1992)Multiresolution morphological analysis of document images SPIE Visual Commun. Image Process. 1818 648-662
[2]  
Breiman L(2001)Random forests Mach. Learn. 45 5-32
[3]  
Chang CC(2011)Libsvm: a library for support vector machines ACM Trans. Intell. Syst. Technol. 2 27:1-27:27
[4]  
Lin CJ(2007)A survey of document image classification: problem statement, classifier architecture and performance evaluation Int. J. Doc. Anal. Recognit. (IJDAR) 10 1-16
[5]  
Chen N(1998)Segmentation of page images using the area voronoi diagram Comput. Vis. Image Underst. 70 370-382
[6]  
Blostein D(1973)Algorithm for the reduction of the number of points required to represent a digitized line or its caricature Cartogr. Int. J. Geogr. Inf. Geovis. 10 112-122
[7]  
Kise K(1988)A robust algorithm for text string separation from mixed text/graphics images IEEE Trans. Pattern Anal. Mach. Intell. 10 910-918
[8]  
Sato A(2011)Automatic segmentation of digitalized historical manuscripts Multimed. Tools Appl. 55 483-506
[9]  
Iwata M(2014)Bin ratio-based histogram distances and their application to image classification IEEE Trans. Pattern Anal. Mach. Intell. 36 2338-2352
[10]  
Douglas DH(1985)Comparing partitions J. Classif. 2 193-218