Word-Level Thirteen Official Indic Languages Database for Script Identification in Multi-script Documents

被引:0
|
作者
Obaidullah, Sk Md [1 ]
Santosh, K. C. [2 ]
Halder, Chayan [3 ]
Das, Nibaran [4 ]
Roy, Kaushik [3 ]
机构
[1] Aliah Univ Kolkata, Dept Comp Sci & Engn, Kolkata, W Bengal, India
[2] Univ South Dakota, Dept Comp Sci, Vermillion, SD 57069 USA
[3] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
[4] West Bengal State Univ, Dept Comp Sci, Kolkata, India
关键词
Multi-script documents; Official indic script database; Script identification;
D O I
10.1007/978-981-10-4859-3_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Without a publicly available database, we cannot advance research nor can we make a fair comparison with the state-of-the-art methods. To bridge this gap, we present a database of eleven Indic scripts from thirteen official languages for the purpose of script identification in multi-script document images. Our database is composed of 39K words that are equally distributed (i.e., 3K words per language). At the same time, we also study three different pertinent features: spatial energy (SE), wavelet energy (WE) and the Radon transform (RT), including their possible combinations, by using three different classifiers: multilayer perceptron (MLP), fuzzy unordered rule induction algorithm (FURIA) and random forest (RF). In our test, using all features, MLP is found to be the best performer showing the bi-script accuracy of 99.24% (keeping Roman common), 98.38% (keeping Devanagari common) and tri-script accuracy of 98.19% (keeping both Devanagari and Roman common).
引用
收藏
页码:16 / 27
页数:12
相关论文
共 50 条
  • [1] Word-Level Script Identification from Handwritten Multi-script Documents
    Singh, Pawan Kumar
    Mondal, Arafat
    Bhowmik, Showmik
    Sarkar, Ram
    Nasipuri, Mita
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 551 - 558
  • [2] A Texture based approach to Word-level Script Identification from Multi-script Handwritten Documents
    Singh, Pawan Kumar
    Khan, Aparajita
    Sarkar, Ram
    Nasipuri, Mita
    2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 228 - 232
  • [3] A blind indic script recognizer for multi-script documents
    Pati, Peeta Basa
    Ramakrishnan, A. G.
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 1248 - 1252
  • [4] Separating Indic Scripts with 'matra'-A Precursor to Script Identification in Multi-script Documents
    Obaidullah, Sk. Md.
    Goswami, Chitrita
    Santosh, K. C.
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2016, VOL 1, 2017, 459 : 205 - 214
  • [5] Word-level Script Identification for Handwritten Indic scripts
    Singh, Pawan Kumar
    Sarkar, Ram
    Nasipuri, Mita
    Doermann, David
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1106 - 1110
  • [6] Word level multi-script identification
    Pati, Peeta Basa
    Ramakrishnan, A. G.
    PATTERN RECOGNITION LETTERS, 2008, 29 (09) : 1218 - 1229
  • [7] Separating Indic Scripts with matra for Effective Handwritten Script Identification in Multi-Script Documents
    Obaidullah, Sk Md
    Goswami, Chitrita
    Santosh, K. C.
    Das, Nibaran
    Halder, Chayan
    Roy, Kaushik
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2017, 31 (05)
  • [8] A Study on Word-Level Multi-script Identification from Video Frames
    Sharma, Nabin
    Pal, Umapada
    Blumenstein, Michael
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 1827 - 1833
  • [9] Script Identification of Multi-Script Documents: A Survey
    Ubul, Kurban
    Tursun, Gulzira
    Aysa, Alimjan
    Impedovo, Donato
    Pirlo, Giuseppe
    Yibulayin, Tuergen
    IEEE ACCESS, 2017, 5 : 6546 - 6559
  • [10] Handwritten Indic Script Identification in Multi-Script Document Images: A Survey
    Obaidullah, Sk Md
    Santosh, K. C.
    Das, Nibaran
    Halder, Chayan
    Roy, Kaushik
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (10)