Math-word embedding in math search and semantic extraction

被引:23
作者
Greiner-Petter, Andre [1 ]
Youssef, Abdou [2 ,3 ]
Ruas, Terry [1 ]
Miller, Bruce R. [3 ]
Schubotz, Moritz [1 ,4 ]
Aizawa, Akiko [5 ]
Gipp, Bela [1 ]
机构
[1] Univ Wuppertal, Wuppertal, Germany
[2] George Washington Univ, Washington, DC USA
[3] NIST, Appl & Computat Math Div, Gaithersburg, MD 20899 USA
[4] FIZ Karlsruhe, Berlin, Germany
[5] Natl Inst Informat, Tokyo, Japan
关键词
Mathematical information retrieval; Math search; Semantic extraction; Machine learning; Word embedding; Math embedding; REPRESENTATION;
D O I
10.1007/s11192-020-03502-9
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Word embedding, which represents individual words with semantically fixed-length vectors, has made it possible to successfully apply deep learning to natural language processing tasks such as semantic role-modeling, question answering, and machine translation. As math text consists of natural text, as well as math expressions that similarly exhibit linear correlation and contextual characteristics, word embedding techniques can also be applied to math documents. However, while mathematics is a precise and accurate science, it is usually expressed through imprecise and less accurate descriptions, contributing to the relative dearth of machine learning applications for information retrieval in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in word embedding, it is worthwhile to explore their use and effectiveness in math information retrieval tasks, such as math language processing and semantic knowledge extraction. In this paper, we explore math embedding by testing it on several different scenarios, namely, (1) math-term similarity, (2) analogy, (3) numerical concept-modeling based on the centroid of the keywords that characterize a concept, (4) math search using query expansions, and (5) semantic extraction, i.e., extracting descriptive phrases for math expressions. Due to the lack of benchmarks, our investigations were performed using the arXiv collection of STEM documents and carefully selected illustrations on the Digital Library of Mathematical Functions (DLMF: NIST digital library of mathematical functions. Release 1.0.20 of 2018-09-1, 2018). Our results show that math embedding holds much promise for similarity, analogy, and search tasks. However, we also observed the need for more robust math embedding approaches. Moreover, we explore and discuss fundamental issues that we believe thwart the progress in mathematical information retrieval in the direction of machine learning.
引用
收藏
页码:3017 / 3046
页数:30
相关论文
共 77 条
[1]  
Aizawa A., 2014, Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies, P88
[2]  
Aizawa A., 2013, Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, P654
[3]  
ALMasri Mohannad, 2016, EUR C INF RETR, P709, DOI DOI 10.1007/978-3-319-30671-1_57
[4]  
ALTAMIMI M, 2007, ISCA 16 INT C SOFTW
[5]  
[Anonymous], 2017, Proceedings of CoNLL 2017, DOI [10.18653/v1/K17-1012, DOI 10.18653/V1/K17-1012]
[6]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[7]  
Bowman Samuel R., 2015, EMNLP, P632
[8]  
Bruce Croft W., 2009, Search Engines-Information Retrieval in Practice
[9]  
Camacho-Collados J, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P741
[10]   Word2vec applied to Recommendation: Hyperparameters Matter [J].
Caselles-Dupre, Hugo ;
Lesaint, Florian ;
Royo-Letelier, Jimena .
12TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS), 2018, :352-356