Recognition of Offline Handwritten Chinese Characters Using the Tesseract Open Source OCR Engine

被引:6
|
作者
Li, Qi [1 ]
An, Weihua [1 ]
Zhou, Anmi [1 ]
Ma, Lehui [1 ]
机构
[1] Beijing Language & Culture Univ, Coll Informat Sci, Beijing, Peoples R China
来源
2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 2 | 2016年
关键词
Offline handwritten Chinese characters; Optical Character Recognition; Tesseract;
D O I
10.1109/IHMSC.2016.239
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the complex structure and handwritten deformation, the offline handwritten Chinese characters recognition has been one of the most challenging problems. In this paper, an offline handwritten Chinese character recognition tool has been developed based on the Tesseract open source OCR engine. The tool mainly contributes on the following two points: First, a handwritten Chinese character features library is generated, which is independent of a specific user's writing style; Second, by preprocessing the input image and adjusting the Tesseract engine, multiple candidate recognition results are output based on weight ranking. The recognition accuracy rate of this tool is above 88% for both known user test set and unknown user test set. It has shown that the Tesseract engine is feasible for offline handwritten Chinese character recognition to a certain degree.
引用
收藏
页码:452 / 456
页数:5
相关论文
共 50 条
  • [1] Multiresolution recognition of offline handwritten Chinese characters with wavelet transform
    Huang, L
    Huang, X
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 631 - 634
  • [2] System for the offline recognition of handwritten characters
    Gutierrez, Munoz
    Andres, Pablo
    Ocampo, Ibarguen
    Javier, Francisco
    Aristizabal, Cardona
    Evelio, Jaiber
    REVISTA DE INVESTIGACIONES-UNIVERSIDAD DEL QUINDIO, 2007, 17 : 189 - 203
  • [3] How to Improve Optical Character Recognition of Historical Finnish Newspapers Using Open Source Tesseract OCR Engine - Final Notes on Development and Evaluation
    Koistinen, Mika
    Kettunen, Kimmo
    Kervinen, Jukka
    HUMAN LANGUAGE TECHNOLOGY. CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2017, 2020, 12598 : 17 - 30
  • [4] Multiclass Recognition of Offline Handwritten Devanagari Characters using CNN
    Bisht, Mamta
    Gupta, Richa
    INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2020, 5 (06) : 1429 - 1439
  • [5] Offline recognition of handwritten Chinese characters using Gabor features, CDHMM modeling and MCE training
    Ge, Y
    Huo, Q
    Feng, ZD
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 1053 - 1056
  • [6] High-accuracy offline handwritten Chinese characters recognition using convolutional neural network
    Jiang, Yi
    Song, Yaohui
    Journal of Computers (Taiwan), 2020, 31 (06) : 12 - 23
  • [7] Fuzzy recognition of offline handwritten numeric characters
    Batuwita, K. B. M. R.
    Bandara, G. E. M. D. C.
    2006 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2006, : 766 - +
  • [8] A study on the use of CDHMM for large vocabulary offline recognition of handwritten chinese characters
    Ge, Y
    Huo, Q
    EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, : 334 - 338
  • [9] Oscillatory elastic graph matching model for recognition of offline handwritten Chinese characters
    Hong Kong Polytechnic Univ, Hong Kong, Hong Kong
    Int Conf Knowledge Based Intell Electron Syst Proc KES, (284-287):
  • [10] Recognition of offline handwritten Urdu characters using RNN and LSTM models
    Misgar, Muzafar Mehraj
    Mushtaq, Faisel
    Khurana, Surinder Singh
    Kumar, Munish
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (02) : 2053 - 2076