A Semi-Supervised Ensemble Learning Approach for Character Labeling with Minimal Human Effort

被引:23
作者
Vajda, Szilard [1 ]
Junaidi, Akmal [1 ]
Fink, Gernot A. [1 ]
机构
[1] TU Dortmund, Dept Comp Sci, Dortmund, Germany
来源
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011) | 2011年
关键词
semi-supervised character labeling; clustering; ensemble learning; Lampung characters;
D O I
10.1109/ICDAR.2011.60
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the major issues in handwritten character recognition is the efficient creation of ground truth to train and test the different recognizers. The manual labeling of the data by a human expert is a tedious and costly procedure. In this paper we propose an efficient and low-cost semiautomatic labeling system for character datasets. First, the data is represented in different abstraction levels, which is clustered after in an unsupervised manner. The different clusters are labeled by the human experts and finally an unanimity voting is considered to decide if a label is accepted or not. The experimental results prove that labeling only less than 0.5% of the training data is sufficient to achieve 86.21% recognition rate for a brand new script (Lampung) and 94.81% for the MNIST benchmark dataset, considering only a K -nearest neighbor classifier for recognition.
引用
收藏
页码:259 / 263
页数:5
相关论文
共 12 条
[1]  
[Anonymous], 2004, COMBINING PATTERN CL, DOI DOI 10.1002/0471660264
[2]  
Battacharya U, 2005, PROC INT CONF DOC, P789
[3]  
Daniels P. T., 1996, WORLDS WRITING SYSTE
[4]   Script Recognition-A Review [J].
Ghosh, Debashis ;
Dube, Tulika ;
Shivaprasad, Adamane P. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (12) :2142-2161
[5]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507
[6]  
LeCun Y., 2001, Intelligent Signal Processing IEEE Press, P306
[7]   Recognition of handwritten Chinese characters by critical region analysis [J].
Leung, K. C. ;
Leung, C. H. .
PATTERN RECOGNITION, 2010, 43 (03) :949-961
[8]  
Mozaffari Saeed, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P1413, DOI 10.1109/ICDAR.2009.283
[9]  
Mozaffari S., 2006, INT WORKSH FRONT HAN
[10]  
Stamatopoulos N., 2010, Proceedings 2010 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), P226, DOI 10.1109/ICFHR.2010.43