Which OCR toolset is good and why? A comparative study

被引:3
作者
Jain, Pooja [1 ]
Taneja, Kavita [1 ]
Taneja, Harmunish [2 ]
机构
[1] Panjab Univ, Dept Comp Sci & Applicat, Chandigarh, India
[2] DAV Coll, Dept Comp Sci & Informat Tech, Sec 10, Chandigarh, India
关键词
ABBYY FineReader; Calamari; Google Docs; OCR; Tesseract;
D O I
10.48129/kjs.v48i2.9589
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Optical Character Recognition (OCR) is a very active research area in many challenging fields like pattern recognition, natural language processing (NLP), computer vision, biomedical informatics, machine learning (ML), and artificial intelligence (AI). This computational technology extracts the text in an editable format (MS Word/Excel, text files, etc.) from PDF files, scanned or hand-written documents, images (photographs, advertisements, and alike), etc. for further processing and has been utilized in many real-world applications including banking, education, insurance, finance, healthcare and keyword-based search in documents, etc. Many OCR toolsets are available under various categories, including open-source, proprietary, and online services. This research paper provides a comparative study of various OCR toolsets considering a variety of parameters.
引用
收藏
页数:12
相关论文
共 19 条
  • [1] Al-Hmouz R, 2020, KUWAIT J SCI, V47
  • [2] [Anonymous], 2008, DOCUMENT RECOGNITION
  • [3] High Performance OCR for Camera-Captured Blurred Documents with LSTM Networks
    Asad, Fallak
    Ul-Hasan, Adnan
    Shafait, Faisal
    Dengel, Andreas
    [J]. PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 7 - 12
  • [4] OMNIDOCUMENT TECHNOLOGIES
    BOKSER, M
    [J]. PROCEEDINGS OF THE IEEE, 1992, 80 (07) : 1066 - 1078
  • [5] Rosetta: Large Scale System for Text Detection and Recognition in Images
    Borisyuk, Fedor
    Gordo, Albert
    Sivakumar, Viswanath
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 71 - 79
  • [6] High-Performance OCR for Printed English and Fraktur using LSTM Networks
    Breuel, Thomas M.
    Ul-Hasan, Adnan
    Al Azawi, Mayce
    Shafait, Faisal
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 683 - 687
  • [7] Cao H., 2014, HDB DOCUMENT IMAGE P, P331
  • [8] Dhiman S., 2013, International Journal of Recent Technology and Engineering, V2, P80
  • [9] Gabasio A., 2013, COMP OPTICAL CHARACT, P8
  • [10] Goswami R., 2013, INT J COMPUTER APPL, V83