Prediction of Mathematical Expression Declarations based on Spatial, Semantic, and Syntactic Analysis

被引:4
|
作者
Lin, Jason [1 ]
Wang, Xing [1 ]
Wang, Zelun [1 ]
Beyette, Donald [1 ]
Liu, Jyh-Charn [1 ]
机构
[1] Texas A&M Univ, Dept Comp Sci & Engn, College Stn, TX 77843 USA
来源
DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019 | 2019年
关键词
Mathematical expression; Declaration extraction; Co-reference;
D O I
10.1145/3342558.3345399
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Mathematical expressions (ME) and words are carefully bonded together in most science, technology, engineering, and mathematics (STEM) documents. They respectively give quantitative and qualitative descriptions of a system model under discussion. This paper proposes a general model for finding the co-reference relations between words and MEs, based on which we developed a novel algorithm for predicting the natural language declarations of MEs-the ME-Dec. The prediction algorithm is applied in a three-level framework, where the first level is a customized tagger to identify the syntactic roles of MEs and the part-of-speech (POS) tags of words in the ME-word mixed sentences. The second level screens the ME-Dec candidates based on the hypothesis that most ME-Dec are noun phrases (NP). A shallow chunker is trained from the fuzzy process mining algorithm, which uses the labeled POS tag series in the NTCIR-10 dataset as input to mine for the frequent syntactic patterns of NP. In the third level, using distance, word stem, and POS tag respectively as the spatial, semantic, and syntactic features, the bonding model between MEs and ME-Dec candidates is trained on the NTCIR-10 training set. The final prediction results are made upon the majority votes of an ensemble of Naive Bayesian classifiers based on the three features. Evaluation of the model on the NTCIR-10 test set, the proposed algorithm achieved 75% and 71% average F1 score in soft matching and strict matching, respectively, which outperforms the state-of-the-art solutions by a margin of 5-18%.(1)
引用
收藏
页数:10
相关论文
共 7 条
  • [1] Syntactic data generation for handwritten mathematical expression recognition
    Thanh-Nghia Truong
    Cuong Tuan Nguyen
    Nakagawa, Masaki
    PATTERN RECOGNITION LETTERS, 2022, 153 : 83 - 91
  • [2] A Semantic Approach for Mathematical Expression Retrieval
    Asebriy, Zahra
    Raghay, Said
    Kaloun, Soulaimane
    Bencharef, Omar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (09) : 190 - 194
  • [3] Structural analysis and semantic understanding for offline mathematical expressions
    Chen, Y
    Okada, M
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2001, 15 (06) : 967 - 987
  • [4] Prediction of Mathematical Expression Constraints (ME-Con)
    Lin, Jason
    Wang, Xing
    Liu, Jyh-Charn
    PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 2018), 2018,
  • [5] A Content-Constrained Spatial (CCS) Model for Layout Analysis of Mathematical Expressions
    Wang, Xing
    Liu, Jyh-Charn
    2017 TWELFTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM), 2017, : 334 - 339
  • [6] CNN based spatial classification features for clustering offline handwritten mathematical expressions
    Cuong Tuan Nguyen
    Vu Tran Minh Khuong
    Hung Tuan Nguyen
    Nakagawa, Masaki
    PATTERN RECOGNITION LETTERS, 2020, 131 (131) : 113 - 120
  • [7] An optimized neural Network-based character recognition and relation finding for mathematical expression images
    Sharada, H. N.
    Anami, Basavaraj
    Allagi, Shridhar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57163 - 57185