Rule-Based Page Segmentation for Palm Leaf Manuscript on Color Image

被引:0
作者
Inkeaw, Papangkorn [1 ]
Bootkrajang, Jakramate [1 ]
Charoenkwan, Phasit [2 ]
Marukatat, Sanparith [3 ]
Ho, Shinn-Ying [4 ]
Chaijaruwanich, Jeerayut [1 ]
机构
[1] Chiang Mai Univ, Dept Comp Sci, Fac Sci, Chiang Mai, Thailand
[2] Chiang Mai Univ, Coll Arts Media & Technol, Chiang Mai, Thailand
[3] Natl Elect & Comp Technol Ctr, Pathum Thani, Thailand
[4] Natl Chiao Tung Univ, Inst Bioinformat & Syst Biol, Hsinchu, Taiwan
来源
DIGITAL LIBRARIES: KNOWLEDGE, INFORMATION, AND DATA IN AN OPEN ACCESS SOCIETY | 2016年 / 10075卷
关键词
Palm leaf manuscripts; Page segmentation; L*a*b* Color Space; Rule-based selection;
D O I
10.1007/978-3-319-49304-6_16
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Palm leaf manuscripts are important source of history and ancient wisdom. Large number of manuscripts have been already digitized in the form of folio images. To extract useful information, an optical character recognition (OCR) is often considered to be the first step towards text mining. Unfortunately, folio images contain multiple unsegmented palm leaf images, making it difficult to manage in OCR process. This motivates us to propose a new page segmentation method for palm leaf manuscripts. This method consists of two main steps, first of which is the detection of objects in folio images using Connected Component Labeling method in a transformed L*a*b* color space. The second step is rule- based selection of objects as either palm leaf or not palm leaf. The experiments performed on 20 publicly available palm leaf manuscripts composed of 384 folio images demonstrated that the proposed method effectively segmented folio images into separate palm leaf images, with 99.86% precision and 96.67% recall scores.
引用
收藏
页码:127 / 136
页数:10
相关论文
empty
未找到相关数据