Consensus-based clustering for document image segmentation

被引:2
作者
Dey, Soumyadeep [1 ]
Mukherjee, Jayanta [1 ]
Sural, Shamik [1 ]
机构
[1] Indian Inst Technol Kharagpur, Dept Comp Sci & Engn, Kharagpur 721302, W Bengal, India
关键词
Document analysis; Segmentation; Clustering; Hypothesis testing; Stroke width; TEXT EXTRACTION; CLASSIFICATION; STAMP;
D O I
10.1007/s10032-016-0275-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Segmentation of a document image plays an important role in automatic document processing. In this paper, we propose a consensus-based clustering approach for document image segmentation. In this method, the foreground regions of a document image are grouped into a set of primitive blocks, and a set of features is extracted from them. Similarities among the blocks are computed on each feature using a hypothesis test-based similarity measure. Based on the consensus of these similarities, clustering is performed on the primitive blocks. This clustering approach is used iteratively with a classifier to label each primitive block. Experimental results show the effectiveness of the proposed method. It is further shown in the experimental results that the dependency of classification performance on the training data is significantly reduced.
引用
收藏
页码:351 / 368
页数:18
相关论文
共 54 条
[1]   A Generic Method for Stamp Segmentation Using Part-based Features [J].
Ahmed, Sheraz ;
Shafait, Faisal ;
Liwicki, Marcus ;
Dengel, Andreas .
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :708-712
[2]  
Almageed W.Abd., 2008, Int'l Conf. on Pattern Recognition (ICPR'08), P1
[3]  
[Anonymous], COMPUTER GRAPHICS C
[4]  
[Anonymous], DAR 12
[5]  
[Anonymous], 2 INT C COMP ENG TEC
[6]  
[Anonymous], 2009, DIGITAL IMAGE PROCES
[7]  
[Anonymous], 2008, Introduction to information retrieval
[8]  
[Anonymous], 11 INT C DOC AN REC
[9]  
[Anonymous], 2009, P 26 ANN INT C MACH
[10]  
[Anonymous], 1973, Cartographica: the international journal for geographic information and geovisualization, DOI [DOI 10.3138/FM57-6770-U75U-7727, 10.3138/FM57-6770-U75U-7727]