Word-Level Thirteen Official Indic Languages Database for Script Identification in Multi-script Documents

被引：0

作者：

Obaidullah, Sk Md ^{[1
]}

Santosh, K. C. ^{[2
]}

Halder, Chayan ^{[3
]}

Das, Nibaran ^{[4
]}

Roy, Kaushik ^{[3
]}

机构：

[1] Aliah Univ Kolkata, Dept Comp Sci & Engn, Kolkata, W Bengal, India

[2] Univ South Dakota, Dept Comp Sci, Vermillion, SD 57069 USA

[3] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India

[4] West Bengal State Univ, Dept Comp Sci, Kolkata, India

来源：

RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016) | 2017年 / 709卷

关键词：

Multi-script documents; Official indic script database; Script identification;

D O I：

10.1007/978-981-10-4859-3_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Without a publicly available database, we cannot advance research nor can we make a fair comparison with the state-of-the-art methods. To bridge this gap, we present a database of eleven Indic scripts from thirteen official languages for the purpose of script identification in multi-script document images. Our database is composed of 39K words that are equally distributed (i.e., 3K words per language). At the same time, we also study three different pertinent features: spatial energy (SE), wavelet energy (WE) and the Radon transform (RT), including their possible combinations, by using three different classifiers: multilayer perceptron (MLP), fuzzy unordered rule induction algorithm (FURIA) and random forest (RF). In our test, using all features, MLP is found to be the best performer showing the bi-script accuracy of 99.24% (keeping Roman common), 98.38% (keeping Devanagari common) and tri-script accuracy of 98.19% (keeping both Devanagari and Roman common).

引用

页码：16 / 27

页数：12

共 50 条

[21] Word-Level Script Identification from Scene Images
Fasil, O. K.
Manjunath, S.
Aradhya, V. N. Manjunath
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON FRONTIERS IN INTELLIGENT COMPUTING: THEORY AND APPLICATIONS, (FICTA 2016), VOL 2, 2017, 516 : 417 - 426
[22] Improved Shape Code Based Word Matching For Multi-script Documents
Mondal, Tanmoy
Tarafdar, Arundhati
Ragot, Nicolas
Ramel, Jean-Yves
Pal, Umapada
PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 181 - 185
[23] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
Ferrer, Miguel A.
Das, Abhijit
Diaz, Moises
Morales, Aythami
Carmona-Duarte, Cristina
Pal, Umapada
arXiv,
[24] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
Ferrer, Miguel A.
Das, Abhijit
Diaz, Moises
Morales, Aythami
Carmona-Duarte, Cristina
Pal, Umapada
COGNITIVE COMPUTATION, 2024, 16 (01) : 131 - 157
[25] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
Miguel A. Ferrer
Abhijit Das
Moises Diaz
Aythami Morales
Cristina Carmona-Duarte
Umapada Pal
Cognitive Computation, 2024, 16 (1) : 131 - 157
[26] WORD-LEVEL RECOGNITION OF CURSIVE SCRIPT
FARAG, RFH
IEEE TRANSACTIONS ON COMPUTERS, 1979, 28 (02) : 172 - 175
[27] Word-Level Script Identification Using Texture Based Features
Singh, Pawan Kumar
Sarkar, Ram
Nasipuri, Mita
INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS, 2015, 4 (02) : 74 - 94
[28] Multi-script bibliographic database: an Indian perspective
Chandrakar, R
ONLINE INFORMATION REVIEW, 2002, 26 (04) : 246 - 251
[29] Word Level Multi-Script Identification Using Curvelet Transform in Log-Polar Domain
Sahare, Parul
Chaudhari, Ravindra E.
Dhok, Sanjay B.
IETE JOURNAL OF RESEARCH, 2019, 65 (03) : 410 - 432
[30] Identifying script on word-level with informational confidence
Jaeger, S
Ma, HF
Doermann, D
EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 416 - 420

← 1 2 3 4 5 →