Identification of Indic Scripts on Torn-Documents

被引:6
作者
Chanda, Sukalpa [1 ]
Franke, Katrin [1 ]
Pal, Umapada [2 ]
机构
[1] Gjovik Univ Coll, Dept Comp Sci & Media Technol, N-2815 Gjovik, Norway
[2] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 700108, India
来源
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011) | 2011年
关键词
Script Identification; Torn Document; Gaussian Kernel SVM; Computational Forensics;
D O I
10.1109/ICDAR.2011.149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Questioned Document Examination processes often encompass analysis of torn documents. To aid a forensic expert, automatic classification of content type in torn documents might be useful. This helps a forensic expert to sort out similar document fragments from a pile of torn documents. One parameter of similarity could be the script of the text. In this article we propose a method to identify the script in document fragments. Torn documents are normally characterized by text with arbitrary orientation. We use Zernike moment-based feature that is rotation invariant together with Support Vector Machine (SVM) to classify the script type. Subsequently gradient features are used for comparative analysis of results between rotation dependent and rotation invariant feature type. We achieved an overall script-identification accuracy of 81.39% when dealing with 11 different scripts at character/connected-component level and 94.65% at word level.
引用
收藏
页码:713 / 717
页数:5
相关论文
共 18 条
[1]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[2]   A CASE STUDY OF GROUP INTERACTION IN A TAIWAN PRIMARY FOUR MATHEMATICS CLASSROOM [J].
Chang, Shu-I .
PME 34 BRAZIL: PROCEEDINGS OF THE 34TH CONFERENCE OF THE INTERNATIONAL GROUP FOR THE PSYCHOLOGY OF MATHEMATICS EDUCATION, VOL 2: MATHEMATICS IN DIFFERENT SETTINGS, 2010, :18-18
[3]   Script identification in printed bilingual documents [J].
Dhanya, D ;
Ramakrishnan, AG ;
Pati, PB .
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2002, 27 (1) :73-82
[4]  
Franke K., 2002, Advances in Soft Computing - AFSS 2002. 2002 AFSS International Conference on Fuzzy Systems. Proceedings (Lecture Notes in Artificial Intelligence Vol.2275), P171
[5]   Automatic script identification from document images using cluster-based templates [J].
Hochberg, J ;
Kelly, P ;
Thomas, T ;
Kerns, L .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (02) :176-181
[6]   Identifying script on word-level with informational confidence [J].
Jaeger, S ;
Ma, HF ;
Doermann, D .
EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, :416-420
[7]   INVARIANT IMAGE RECOGNITION BY ZERNIKE MOMENTS [J].
KHOTANZAD, A ;
HONG, YH .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (05) :489-497
[8]   A straight line detection using principal component analysis [J].
Lee, Yun-Seok ;
Koo, Han-Suh ;
Jeong, Chang-Sung .
PATTERN RECOGNITION LETTERS, 2006, 27 (14) :1744-1754
[9]  
Pal U., 2006, Vivek, V16, P26
[10]  
Pal U, 2007, PROC INT CONF DOC, P749