The impact of OCR accuracy on automatic text classification

被引:0
|
作者
Zu, GW
Murata, M
Ohyama, W
Wakabayashi, T
Kimura, F
机构
[1] Mie Univ, Fac Engn, Tsu, Mie 5148507, Japan
[2] Toshiba Solut Corp, Syst Integrat Technol Ctr, Minato Ku, Tokyo 1056691, Japan
来源
CONTENT COMPUTING, PROCEEDINGS | 2004年 / 3309卷
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Current general digitization approach of paper media is converting them into the digital images by a scanner, and then reading them by an OCR to generate ASCII text for full-text retrieval. However, it is impossible to recognize all characters with 100% accuracy by the present OCR technology. Therefore, it is important to know the impact of OCR accuracy on automatic text classification to reveal its technical feasibility. In this research we perform automatic text classification experiments for English newswire articles to study on the relationships between the accuracies of OCR and the text classification employing the statistical classification techniques.
引用
收藏
页码:403 / 409
页数:7
相关论文
共 50 条
  • [1] The impact of OCR accuracy and feature transformation on automatic text classification
    Murata, M
    Busagala, LSP
    Ohyama, W
    Wakabayashi, T
    Kimura, F
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 506 - 517
  • [2] Improving OCR text categorization accuracy with electronic abstracts
    Li, Linlin
    Tan, Chew Lim
    SECOND INTERNATIONAL CONFERENCE ON DOCUMENT IMAGE ANALYSIS FOR LIBRARIES, PROCEEDINGS, 2006, : 82 - +
  • [3] A high accuracy OCR system for printed Telugu text
    Lakshmi, CV
    Patvardhan, C
    IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 725 - 729
  • [4] The Use of Stemming in the Arabic Text and Its Impact on the Accuracy of Classification
    Atwan, Jaffar
    Wedyan, Mohammad
    Bsoul, Qusay
    Hammadeen, Ahmad
    Alturki, Ryan
    SCIENTIFIC PROGRAMMING, 2021, 2021
  • [5] Automatic cataloguing and searching for retrospective data by use of OCR text
    Tseng, YH
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2001, 52 (05): : 378 - 390
  • [6] Automatic Text Summarization and Classification
    Simske, Steven J.
    Lins, Rafael
    PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 2018), 2018,
  • [7] Automatic Classification of Text Complexity
    Santucci, Valentino
    Santarelli, Filippo
    Forti, Luciana
    Spina, Stefania
    APPLIED SCIENCES-BASEL, 2020, 10 (20): : 1 - 19
  • [8] AUTOMATIC ERROR-CORRECTION AND QUERY EVALUATION OF OCR GENERATED TEXT
    TAGHVA, K
    BORSACK, J
    CONDIT, A
    ONLINE & CDROM REVIEW, 1994, 18 (01): : 47 - 47
  • [9] The impact of automatic text translation on classification of online discussions for social and cognitive presences
    Barbosa, Arthur
    Ferreira, Maverick
    Mello, Rafael Ferreira
    Lins, Rafael Dueire
    Gasevic, Dragan
    LAK21 CONFERENCE PROCEEDINGS: THE ELEVENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, 2021, : 77 - 87
  • [10] Accuracy Evaluation of Arabic Text Classification
    Sayed, Mostafa
    Salem, Rashed
    Khedr, Ayman E.
    2017 12TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2017, : 365 - 370