Script Identification for Printed and Handwritten Indian Documents: An Empirical Study of Different Feature Classifier Combinations

被引:5
|
作者
Rani, Rajneesh [1 ]
Dhir, Renu [1 ]
Kakkar, Deepti [2 ]
Sharma, Nonita [1 ]
机构
[1] Dr BR Ambedkar Natl Inst Technol, Dept Comp Sci & Engn, Jalandhar 144011, Punjab, India
[2] Dr BR Ambedkar Natl Inst Technol, Dept Elect & Commun Engn, Jalandhar 144011, Punjab, India
关键词
Script identification; page level; texture features; machine learning; Gabor; wavelet; INVARIANT TEXTURE FEATURES; ROTATION;
D O I
10.1142/S0219467821400118
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The identification of script in a document page image is the first step for an OCR system processing multi-script documents. In this multilingual/multiscript world, document processing systems relying on the OCR that need human involvement to select the appropriate OCR package is definitely undesirable and inefficient. The development of robust and efficient methods for automatic script identification of a document is a subject of major importance for automatic document processing in a multilingual/multiscript environment. Thus, the basic objective is to come up with some intuitive methods having straightforward implementation without compromising with efficiency. The aim of this work is to evaluate state-of-the-art feature extraction and classification techniques in the field of automatic script identification of printed and handwritten documents and to propose the best combination for the same.
引用
收藏
页数:21
相关论文
共 20 条
  • [1] Script identification in handwritten and printed documents using convolutional recurrent connection
    Jindal A.
    Multimedia Tools and Applications, 2025, 84 (9) : 5549 - 5563
  • [2] Script identification in printed bilingual documents
    D. Dhanya
    A. G. Ramakrishnan
    Peeta Basa Pati
    Sadhana, 2002, 27 : 73 - 82
  • [3] Script identification in printed bilingual documents
    Dhanya, D
    Ramakrishnan, AG
    Pati, PB
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2002, 27 (1): : 73 - 82
  • [4] A System for Handwritten Script Identification from Indian Document
    Obaidullah, Sk Md
    Das, Supratik Kundu
    Roy, Kaushik
    JOURNAL OF PATTERN RECOGNITION RESEARCH, 2013, 8 (01): : 1 - 12
  • [5] Segmentation of Merged Lines and Script Identification in Handwritten Bilingual Documents
    Zinjore, Ranjana S.
    Ramteke, R. J.
    Pathak, Varsha M.
    PROCEEDINGS OF THE 9TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2017), 2017, : 29 - 32
  • [6] Statistical comparison of classifiers for script identification from multi-script handwritten documents
    Singh, Pawan Kumar
    Sarkar, Ram
    Das, Nibaran
    Basu, Subhadip
    Nasipuri, Mita
    INTERNATIONAL JOURNAL OF APPLIED PATTERN RECOGNITION, 2014, 1 (02) : 152 - 172
  • [7] Word-Level Script Identification from Handwritten Multi-script Documents
    Singh, Pawan Kumar
    Mondal, Arafat
    Bhowmik, Showmik
    Sarkar, Ram
    Nasipuri, Mita
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 551 - 558
  • [8] Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage
    Cheikhrouhou, Ahmed
    Kessentini, Yousri
    Kanoun, Slim
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13): : 9201 - 9215
  • [9] Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage
    Ahmed Cheikhrouhou
    Yousri Kessentini
    Slim Kanoun
    Neural Computing and Applications, 2020, 32 : 9201 - 9215
  • [10] HVS inspired system for script identification in Indian multi-script documents
    Pati, PB
    Ramakrishnan, AG
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 380 - 389