Prediction of Mathematical Expression Declarations based on Spatial, Semantic, and Syntactic Analysis

被引：4

作者：

Lin, Jason ^{[1
]}

Wang, Xing ^{[1
]}

Wang, Zelun ^{[1
]}

Beyette, Donald ^{[1
]}

Liu, Jyh-Charn ^{[1
]}

机构：

[1] Texas A&M Univ, Dept Comp Sci & Engn, College Stn, TX 77843 USA

来源：

DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019 | 2019年

关键词：

Mathematical expression; Declaration extraction; Co-reference;

D O I：

10.1145/3342558.3345399

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Mathematical expressions (ME) and words are carefully bonded together in most science, technology, engineering, and mathematics (STEM) documents. They respectively give quantitative and qualitative descriptions of a system model under discussion. This paper proposes a general model for finding the co-reference relations between words and MEs, based on which we developed a novel algorithm for predicting the natural language declarations of MEs-the ME-Dec. The prediction algorithm is applied in a three-level framework, where the first level is a customized tagger to identify the syntactic roles of MEs and the part-of-speech (POS) tags of words in the ME-word mixed sentences. The second level screens the ME-Dec candidates based on the hypothesis that most ME-Dec are noun phrases (NP). A shallow chunker is trained from the fuzzy process mining algorithm, which uses the labeled POS tag series in the NTCIR-10 dataset as input to mine for the frequent syntactic patterns of NP. In the third level, using distance, word stem, and POS tag respectively as the spatial, semantic, and syntactic features, the bonding model between MEs and ME-Dec candidates is trained on the NTCIR-10 training set. The final prediction results are made upon the majority votes of an ensemble of Naive Bayesian classifiers based on the three features. Evaluation of the model on the NTCIR-10 test set, the proposed algorithm achieved 75% and 71% average F1 score in soft matching and strict matching, respectively, which outperforms the state-of-the-art solutions by a margin of 5-18%.(1)

引用

页数：10

共 7 条

[1] Syntactic data generation for handwritten mathematical expression recognition
Thanh-Nghia Truong
Cuong Tuan Nguyen
Nakagawa, Masaki
PATTERN RECOGNITION LETTERS, 2022, 153 : 83 - 91
[2] A Semantic Approach for Mathematical Expression Retrieval
Asebriy, Zahra
Raghay, Said
Kaloun, Soulaimane
Bencharef, Omar
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (09) : 190 - 194
[3] Structural analysis and semantic understanding for offline mathematical expressions
Chen, Y
Okada, M
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2001, 15 (06) : 967 - 987
[4] Prediction of Mathematical Expression Constraints (ME-Con)
Lin, Jason
Wang, Xing
Liu, Jyh-Charn
PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 2018), 2018,
[5] A Content-Constrained Spatial (CCS) Model for Layout Analysis of Mathematical Expressions
Wang, Xing
Liu, Jyh-Charn
2017 TWELFTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM), 2017, : 334 - 339
[6] CNN based spatial classification features for clustering offline handwritten mathematical expressions
Cuong Tuan Nguyen
Vu Tran Minh Khuong
Hung Tuan Nguyen
Nakagawa, Masaki
PATTERN RECOGNITION LETTERS, 2020, 131 (131) : 113 - 120
[7] An optimized neural Network-based character recognition and relation finding for mathematical expression images
Sharada, H. N.
Anami, Basavaraj
Allagi, Shridhar
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57163 - 57185

← 1 →