Script determination of mixed Chinese/English document images using Kolmogorov Complexity measure

被引:0
作者
Chi, ZR [1 ]
Wang, Q [1 ]
机构
[1] Hong Kong Polytech Univ, Ctr Multimedia Signal Proc, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
来源
SECOND INTERNATION CONFERENCE ON IMAGE AND GRAPHICS, PTS 1 AND 2 | 2002年 / 4875卷
关键词
script determination; Kolmogorov complexity; document image processing;
D O I
10.1117/12.477053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper.. we propose an approach based on Kolmogorov Complexity (KC) measure for determining script classes in mixed Chinese (complex characters)/English document images. This approach, which mainly consists of two steps: document image preprocessing and KC measure, can successfully separate Chinese text lines from English ones. Our approach is robust and reliable in handling document images of different appearances and densities, and various fonts, sizes and styles of characters used in documents. Experimental results on a set of 40 text line images (20 English text lines and 20 Complex Chinese text lines) from various document images show that 100% correct classification rate can be achieved.
引用
收藏
页码:686 / 692
页数:7
相关论文
共 17 条
[1]  
[Anonymous], 1989, LECT SCI COMPLEXITY
[2]  
CHIN W, 1992, P 2 INT C SYST INT 1, P476
[3]  
Fisher J. L., 1990, Proceedings. 10th International Conference on Pattern Recognition (Cat. No.90CH2898-5), P567, DOI 10.1109/ICPR.1990.118166
[4]  
GAO Q, 1993, P 1993 IEEE INF THEO, P24
[5]   Page segmentation using texture analysis [J].
Jain, AK ;
Zhong, Y .
PATTERN RECOGNITION, 1996, 29 (05) :743-770
[6]  
JEN E, 1990, LECT COMPLEX SYSTEMS
[7]   EASILY CALCULABLE MEASURE FOR THE COMPLEXITY OF SPATIOTEMPORAL PATTERNS [J].
KASPAR, F ;
SCHUSTER, HG .
PHYSICAL REVIEW A, 1987, 36 (02) :842-848
[8]   3 APPROACHES TO QUANTITATIVE DEFINITION OF INFORMATION [J].
KOLMOGOROV, AN .
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1968, 2 (02) :157-+
[9]  
Kong J, 1998, IEICE T INF SYST, VE81D, P1239
[10]   COMPLEXITY OF FINITE SEQUENCES [J].
LEMPEL, A ;
ZIV, J .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1976, 22 (01) :75-81