The impact of OCR accuracy on automatic text classification

被引:0
|
作者
Zu, GW
Murata, M
Ohyama, W
Wakabayashi, T
Kimura, F
机构
[1] Mie Univ, Fac Engn, Tsu, Mie 5148507, Japan
[2] Toshiba Solut Corp, Syst Integrat Technol Ctr, Minato Ku, Tokyo 1056691, Japan
来源
CONTENT COMPUTING, PROCEEDINGS | 2004年 / 3309卷
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Current general digitization approach of paper media is converting them into the digital images by a scanner, and then reading them by an OCR to generate ASCII text for full-text retrieval. However, it is impossible to recognize all characters with 100% accuracy by the present OCR technology. Therefore, it is important to know the impact of OCR accuracy on automatic text classification to reveal its technical feasibility. In this research we perform automatic text classification experiments for English newswire articles to study on the relationships between the accuracies of OCR and the text classification employing the statistical classification techniques.
引用
收藏
页码:403 / 409
页数:7
相关论文
共 50 条
  • [31] Impact of EEG sampling frequency on the accuracy of automatic seizure detection and classification with DL algorithms
    Meribout, Sarah
    Babushkin, Vladimir
    Talako, Tatsiana
    King, Fransina
    Smetanina, Darya
    Ismail, Fatima
    Gorkom, Klaus
    Gelovani, Juri
    Ljubisavljevic, Milos
    Statsenko, Yauhen
    JOURNAL OF THE NEUROLOGICAL SCIENCES, 2023, 455
  • [32] Impact of the Accuracy of Automatic Segmentation of Cell Nuclei Clusters on Classification of Thyroid Follicular Lesions
    Jung, Chanho
    Kim, Changick
    CYTOMETRY PART A, 2014, 85A (08) : 709 - 718
  • [33] A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
    Feng, Shaolei
    Manmatha, R.
    OPENING INFORMATION HORIZONS, 2006, : 109 - +
  • [34] Improving the classification accuracy of automatic text processing systems using context vectors and back-propagation algorithms
    Farkas, J
    1996 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING - CONFERENCE PROCEEDINGS, VOLS I AND II: THEME - GLIMPSE INTO THE 21ST CENTURY, 1996, : 696 - 699
  • [35] An Efficient Unsupervised Approach for OCR Error Correction of Vietnamese OCR Text
    Nguyen, Quoc-Dung
    Phan, Nguyet-Minh
    Kromer, Pavel
    Le, Duc-Anh
    IEEE ACCESS, 2023, 11 : 58406 - 58421
  • [36] Comparison of the accuracy of SVM kernel functions in text classification
    Kalcheva, Neli
    Karova, Milena
    Penev, Ivaylo
    PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON BIOMEDICAL INNOVATIONS AND APPLICATIONS (BIA 2020), 2020, : 141 - +
  • [37] OCR Error Correction for Vietnamese OCR Text with Different Edit Distances
    Quoc-Dung Nguyen
    Nguyet-Minh Phan
    Kromer, Pavel
    ADVANCES IN INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS, INCOS-2022, 2022, 527 : 130 - 139
  • [38] Improving Text Classification Accuracy by Training Label Cleaning
    Esuli, Andrea
    Sebastiani, Fabrizio
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2013, 31 (04)
  • [39] Automatic text classification to support systematic reviews in medicine
    Garcia Adeva, J. J.
    Pikatza Atxa, J. M.
    Ubeda Carrillo, M.
    Ansuategi Zengotitabengoa, E.
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (04) : 1498 - 1508
  • [40] AUTOMATIC CONCEPT CLASSIFICATION OF TEXT FROM ELECTRONIC MEETINGS
    CHEN, H
    HSU, P
    ORWIG, R
    HOOPES, L
    NUNAMAKER, JF
    COMMUNICATIONS OF THE ACM, 1994, 37 (10) : 56 - 73