A corpus for OCR research on mathematical expressions

被引:15
|
作者
Garain U. [1 ]
Chaudhuri B.B. [1 ]
机构
[1] Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Calcutta-700 035
来源
International Journal of Document Analysis and Recognition (IJDAR) | 2005年 / 7卷 / 4期
关键词
Database; Groundtruthing; Mathematical expressions; OCR; Performance evaluation; Statistical learning;
D O I
10.1007/s10032-004-0140-5
中图分类号
学科分类号
摘要
This paper is concerned with research on OCR (optical character recognition) of printed mathematical expressions. Construction of a representative corpus of technical and scientific documents containing expressions is discussed. A statistical investigation of the corpus is presented, and usefulness of this analysis is demonstrated in the related research problems, namely, (i) identification and segmentation of expression zones from the rest of the document, (ii) recognition of expression symbols, (iii) interpretation of expression structures, and (iv) performance evaluation of a mathematical expression recognition system. Moreover, a groundtruthing format has been proposed to facilitate automatic evaluation of expression recognition techniques. © Springer-Verlag Berlin/Heidelberg 2005.
引用
收藏
页码:241 / 259
页数:18
相关论文
共 45 条
  • [1] CAMIO: A Corpus for OCR in Multiple Languages
    Arrigo, Michael
    Strassel, Stephanie
    King, Nolan
    Tran, Thao
    Mason, Lisa
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1209 - 1216
  • [2] A Unified Approach for Development of Urdu Corpus for OCR and Demographic Purpose
    Choudhary, Prakash
    Nain, Neeta
    Ahmed, Mushtaq
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2014), 2015, 9445
  • [3] Recognition and retrieval of mathematical expressions
    Zanibbi, Richard
    Blostein, Dorothea
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2012, 15 (04) : 331 - 357
  • [4] Embedding a Mathematical OCR Module into OCRopus
    Yamazaki, Shinpei
    Furukori, Fumihiro
    Zhao, Qinzheng
    Shirai, Keiichiro
    Okamoto, Masayuki
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 880 - 884
  • [5] Isolated structural error analysis of printed mathematical expressions
    Kumar, P. Pavan
    Agarwal, Arun
    Bhagvati, Chakravarthy
    PATTERN ANALYSIS AND APPLICATIONS, 2018, 21 (04) : 1097 - 1107
  • [6] Recognition of Ambiguous Mathematical Characters within Mathematical Expressions
    Naik, S. A.
    Metkewar, P. S.
    Mapari, S. A.
    PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [7] Graphical User Interface for Search of Mathematical Expressions with Regular Expressions
    Watabe, Takayuki
    Miyazaki, Yoshinori
    HUMAN-COMPUTER INTERACTION: DESIGN AND EVALUATION, PT I, 2015, 9169 : 438 - 447
  • [8] Syntactic Role Identification of Mathematical Expressions
    Wang, Xing
    Lin, Jason
    Vrecenar, Ryan
    Liu, Jyh-Charn
    2017 TWELFTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM), 2017, : 179 - 184
  • [9] Recognition of online handwritten mathematical expressions
    Garain, U
    Chaudhuri, BB
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (06): : 2366 - 2376
  • [10] Classifying Mathematical Expressions Written in MathML
    Kim, Shinil
    Yang, Seon
    Ko, Youngjoong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10) : 2560 - 2563