A deep learning model for Ottoman OCR

被引:5
|
作者
Dolek, Ishak [1 ]
Kurt, Atakan [1 ]
机构
[1] Istanbul Univ Cerrahpasa, Engn Sch, Comp Engn Dept, Istanbul, Turkey
来源
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2022年 / 34卷 / 20期
关键词
CNN; CTC; deep neural networks; LSTM; OCR; Ottoman; printed naksh font; RNN; NEURAL-NETWORK; RECOGNITION; SEGMENTATION; RETRIEVAL;
D O I
10.1002/cpe.6937
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Ottoman OCR is an open problem because the OCR models for Arabic do not perform well on Ottoman. The models specifically trained with Ottoman documents have not produced satisfactory results either. We present a deep learning model and an OCR tool using that model for the OCR of printed Ottoman documents in the naksh font. We propose an end-to-end trainable CRNN architecture consisting of CNN, RNN (LSTM), and CTC layers for the Ottoman OCR problem. An experimental comparison of this model, called , with the Tesseract Arabic, the Tesseract Persian, Abby Finereader, Miletos, and Google Docs OCR tools or models was performed using a test data set of 21 pages of original documents. With 88.86% raw text, 96.12% normalized text, and 97.37% joined text character recognition accuracy, the Hybrid model outperforms the others with a marked difference. Our model outperforms the next best model by a clear margin of 4% which is a significant improvement considering the difficulty of the Ottoman OCR problem, and the huge size of the Ottoman archives to be processed. The hybrid model also achieves 58% word recognition accuracy on normalized text which is the only rate above 50%.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Ottoman OCR with deep neural networks
    Dolek, Ishak
    Kurt, Atakan
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2023, 38 (04): : 2579 - 2593
  • [2] Amharic OCR: An End-to-End Learning
    Belay, Birhanu
    Habtegebrial, Tewodros
    Meshesha, Million
    Liwicki, Marcus
    Belay, Gebeyehu
    Stricker, Didier
    APPLIED SCIENCES-BASEL, 2020, 10 (03):
  • [3] MC-OCR Challenge 2021: Deep Learning Approach for Vietnamese Receipts OCR
    Bui, Doanh C.
    Dung Truong
    Vo, Nguyen D.
    Khang Nguyen
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 94 - 99
  • [4] ADOCRNet: A Deep Learning OCR for Arabic Documents Recognition
    Mosbah, Lamia
    Moalla, Ikram
    Hamdani, Tarek M.
    Neji, Bilel
    Beyrouthy, Taha
    Alimi, Adel M.
    IEEE ACCESS, 2024, 12 : 55620 - 55631
  • [5] Typewritten OCR Model for Ethiopic Characters
    Deneke, Bereket Siraw
    Aga, Rosa Tsegaye
    Samuel, Mesay
    Mulat, Abel
    Mulat, Ashenafi
    Abebe, Abel
    Mekonnen, Rahel
    Mulugeta, Hiwot
    Debelee, Taye Girma
    Gachena, Worku
    PAN-AFRICAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PT I, PANAFRICON AI 2023, 2024, 2068 : 250 - 261
  • [6] Wild OCR: Deep Learning Architecture for Text Recognition in Images
    Amudha, J.
    Thakur, Manmohan Singh
    Shrivastava, Anupriya
    Gupta, Shubham
    Gupta, Deepa
    Sharma, Kshitij
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION NETWORKS (ICCCN 2021), 2022, 394 : 499 - 506
  • [7] Analysis of Recent Deep Learning Techniques for Arabic Handwritten-Text OCR and Post-OCR Correction
    Najam, Rayyan
    Faizullah, Safiullah
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [8] Smart OCR for Recognizing Bangla Characters with CRAFT and Deep Learning Models
    Hasan, Md Rakibul
    Pew, Anamika Basak
    Alam, Sanzida
    Rifha, Nafisa Tasnim
    Shams, Shamin Yeaser
    Shahriar, Farhan
    Rahman, Rashedur M.
    2022 IEEE 13TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2022, : 573 - 577
  • [9] An Automatic Framework for Number Plate Detection using OCR and Deep Learning Approach
    Shambharkar, Yash
    Salagrama, Shailaja
    Sharma, Kanhaiya
    Mishra, Om
    Parashar, Deepak
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (04) : 8 - 14
  • [10] SiamHRnet-OCR: A Novel Deforestation Detection Model with High-Resolution Imagery and Deep Learning
    Wang, Zhipan
    Liu, Di
    Liao, Xiang
    Pu, Weihua
    Wang, Zhongwu
    Zhang, Qingling
    REMOTE SENSING, 2023, 15 (02)