Page Segmentation for Historical Handwritten Document Images Using Color and Texture Features

被引:27
|
作者
Chen, Kai [1 ]
Wei, Hao [1 ]
Hennebert, Jean [1 ,2 ]
Ingold, Rolf [1 ]
Liwicki, Marcus [1 ,3 ]
机构
[1] Univ Fribourg, Dept Informat, DIVA Res Grp, CH-1700 Fribourg, Switzerland
[2] Univ Appl Sci, HES SO FR, CH-1705 Fribourg, Switzerland
[3] DFKI German Res Ctr Artificial Itelligence, Saarbrucken, Germany
来源
2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR) | 2014年
基金
瑞士国家科学基金会;
关键词
page segmentation; historical document; layout analysis; feature selection; LAYOUT ANALYSIS;
D O I
10.1109/ICFHR.2014.88
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a physical structure detection method for historical handwritten document images. We considered layout analysis as a pixel labeling problem. By classifying each pixel as either periphery, background, text block, or decoration, we achieve high quality segmentation without any assumption of specific topologies and shapes. Various color and texture features such as color variance, smoothness, Laplacian, Local Binary Patterns, and Gabor Dominant Orientation Histogram are used for classification. Some of these features have so far not got many attentions for document image layout analysis. By applying an Improved Fast Correlation-Based Filter feature selection algorithm, the redundant and irrelevant features are removed. Finally, the segmentation results are refined by a smoothing post-processing procedure. The proposed method is demonstrated by experhnents conducted on three different historical handwritten document image datasets. Experiments show the benefit of combining various color and texture features for classification. The results also show the advantage of using a feature selection method to choose optimal feature subset. By applying the proposed method we achieve superior accuracy compared with earlier work on several datasets, e.g., we achieved 93% accuracy compared with 91% of the previous method on the Parzival dataset which contains about 100 million pixels.
引用
收藏
页码:488 / 493
页数:6
相关论文
共 38 条
  • [1] Page Segmentation for Historical Handwritten Document Images Using Conditional Random Fields
    Chen, Kai
    Seuret, Mathias
    Liwicki, Marcus
    Hennebert, Jean
    Liu, Cheng-Lin
    Ingold, Rolf
    PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 90 - 95
  • [2] Page Segmentation of Historical Document Images with Convolutional Autoencoders
    Chen, Kai
    Seuret, Mathias
    Liwicki, Marcus
    Hennebert, Jean
    Ingold, Rolf
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1011 - 1015
  • [3] Page Segmentation for Historical Handwritten Documents Using Fully Convolutional Networks
    Xu, Yue
    He, Wenhao
    Yin, Fei
    Liu, Cheng-Lin
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 541 - 546
  • [4] Convolutional Neural Networks for Page Segmentation of Historical Document Images
    Chen, Kai
    Seuret, Mathias
    Henneberet, Jean
    Ingold, Rolf
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 965 - 970
  • [5] Page Segmentation for Historical Document Images Based on Superpixel Classification with Unsupervised Feature Learning
    Chen, Kai
    Liu, Cheng-Lin
    Seuret, Mathias
    Liwicki, Marcus
    Hennebert, Jean
    Ingold, Rolf
    PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 299 - 304
  • [6] Historical Handwritten Document Segmentation by Using a Weighted Loss
    Capobianco, Samuele
    Scommegna, Leonardo
    Marinai, Simone
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, ANNPR 2018, 2018, 11081 : 395 - 406
  • [7] Fully Convolutional Neural Networks for Page Segmentation of Historical Document Images
    Wick, Christoph
    Puppe, Frank
    2018 13TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS), 2018, : 287 - 292
  • [8] Segmentation and Recognition for Historical Tibetan Document Images
    Ma, Longlong
    Long, Congjun
    Duan, Lijuan
    Zhang, Xiqun
    Li, Yanxing
    Zhao, Quanchao
    IEEE ACCESS, 2020, 8 : 52641 - 52651
  • [9] A deep Convolutional Encoder-Decoder Network for Page Segmentation of Historical Handwritten Documents into Text Zones
    Kaddas, Panagiotis
    Gatos, Basilis
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 259 - 264
  • [10] Weakly supervised precise segmentation for historical document images
    Xie, Zecheng
    Huang, Yaoxiong
    Jin, Lianwen
    Liu, Yuliang
    Zhu, Yuanzhi
    Gao, Liangcai
    Zhang, Xiaode
    NEUROCOMPUTING, 2019, 350 : 271 - 281