A Text-Line Segmentation Method for Historical Tibetan Documents Based on Baseline Detection

被引:7
作者
Li, Yanxing [1 ,2 ]
Ma, Longlong [3 ]
Duan, Lijuan [1 ,4 ]
Wu, Jian [1 ,3 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[2] Beijing Key Lab Trusted Comp, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Software, Chinese Informat Proc Lab, Beijing, Peoples R China
[4] Beijing Key Lab Integrat & Anal Large Scale Strea, Beijing, Peoples R China
来源
COMPUTER VISION, PT I | 2017年 / 771卷
关键词
Historical Tibetan document; Text-line segmentation; Baseline detection;
D O I
10.1007/978-981-10-7299-4_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-line segmentation is an important task in the historical Tibetan document recognition. Historical Tibetan document images usually contain touching or overlapping characters between consecutive text-lines, making text-line segmentation a difficult task. In this paper, we present a text-line segmentation method based on baseline detection. The initial positions for the baseline of each line are obtained by template matching, pruning algorithms and closing operation. The baseline is estimated using dynamic tracing within pixel points of each line and the context information between pixel points. The overlapping or touching areas are cut by finding the minimum width stroke. Finally, text-lines are extracted based on the estimated baseline and the cut position of touching area. The proposed algorithm has been evaluated on the dataset of historical Tibetan document images. Experimental result shows the effectiveness of the proposed method.
引用
收藏
页码:356 / 367
页数:12
相关论文
共 12 条
[1]  
Epshtein B., 2010, COMPUTER VISION PATT
[2]  
Huang H., 2010, J INF COMPUT, V6, P1693
[3]   Text line segmentation of historical documents: a survey [J].
Likforman-Sulem, Laurence ;
Zahour, Abderrazak ;
Taconet, Bruno .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 9 (2-4) :123-138
[4]   Text line detection in handwritten documents [J].
Louloudis, G. ;
Gatos, B. ;
Pratikakis, I. ;
Halatsis, C. .
PATTERN RECOGNITION, 2008, 41 (12) :3758-3772
[5]   A scale space approach for automatically segmenting words from historical handwritten documents [J].
Manmatha, R ;
Rothfeder, JL .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (08) :1212-1225
[6]   Repulsive attractive network for baseline extraction on document images [J].
Öztop, E ;
Mülayim, AY ;
Atalay, V ;
Yarman-Vural, F .
SIGNAL PROCESSING, 1999, 75 (01) :1-10
[7]  
Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
[8]   ICDAR2013 Handwriting Segmentation Contest [J].
Stamatopoulos, Nikolaos ;
Gatos, Basilis ;
Louloudis, Georgios ;
Pal, Umapada ;
Alaei, Alireza .
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :1402-1406
[9]   scikit-image: image processing in Python']Python [J].
van der Walt, Stefan ;
Schonberger, Johannes L. ;
Nunez-Iglesias, Juan ;
Boulogne, Francois ;
Warner, Joshua D. ;
Yager, Neil ;
Gouillart, Emmanuelle ;
Yu, Tony .
PEERJ, 2014, 2
[10]   PRINCIPAL COMPONENT ANALYSIS [J].
WOLD, S ;
ESBENSEN, K ;
GELADI, P .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1987, 2 (1-3) :37-52