Mathematical Expression Extraction from Unstructured Plain Text

被引:0
作者
Fernando, Kulakshi [1 ]
Ranathunga, Surangika [1 ]
Dias, Gihan [1 ]
机构
[1] Univ Moratuwa, Dept Comp Sci & Engn, Katubedda, Sri Lanka
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2019) | 2019年 / 11608卷
关键词
Mathematical expression extraction; Sequential tagging; Information extraction;
D O I
10.1007/978-3-030-23281-8_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mathematical expressions are often found embedded inline with unstructured plain text in the web and documents. They can vary from numbers and variable names to average-level mathematical expressions. Traditional rule-based techniques for mathematical expression extraction do not scale well across a wide range of expression types, and are less robust for expressions with slight typos and lexical ambiguities. This research employs sequential, as well as deep learning classifiers to identify mathematical expressions in a given unstructured text. We compare CRF, LSTM, Bi-LSTM with word embeddings, and Bi-LSTM with word and character embeddings. These were trained with a dataset containing 102K tokens and 9K mathematical expressions. Given the relatively small dataset, the CRF model out-performed RNN models.
引用
收藏
页码:312 / 320
页数:9
相关论文
共 16 条
  • [1] Cetintas S, 2009, FLAIRS C
  • [2] Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN
    Chen, Tao
    Xu, Ruifeng
    He, Yulan
    Wang, Xuan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 72 : 221 - 230
  • [3] Automatic Identification of Errors in Multi-Step Answers to Algebra Questions
    Erabadda, Buddhiprabha
    Ranathunga, Surangika
    Dias, Gihan
    [J]. 2017 IEEE 17TH INTERNATIONAL CONFERENCE ON ADVANCED LEARNING TECHNOLOGIES (ICALT), 2017, : 215 - 219
  • [4] Fernando K, 2018, P 2018 INT C ADV ICT
  • [5] Finkel J. R., 2005, P 43 ANN M ASS COMP, P363, DOI DOI 10.3115/1219840.1219885
  • [6] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [7] Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
  • [8] Huang Zhiheng, 2015, CoRR
  • [9] Kadupitiya JCS, 2016, INT CONF ADV ICT, P66, DOI 10.1109/ICTER.2016.7829900
  • [10] Lafferty J.D., 2001, Proceedings of the Eighteenth International Conference on Machine Learning, P282, DOI [10.1038/nprot.2006.61, DOI 10.1038/NPROT.2006.61]