Improving Open Information Extraction with Distant Supervision Learning

被引:3
作者
Han, Jiabao [1 ]
Wang, Hongzhi [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China
关键词
Distant supervision learning; Open information extraction; Neural network; Sequence-to-sequence model;
D O I
10.1007/s11063-021-10548-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open information extraction (Open IE), as one of the essential applications in the area of Natural Language Processing (NLP), has gained great attention in recent years. As a critical technology for building Knowledge Bases (KBs), it converts unstructured natural language sentences into structured representations, usually expressed in the form of triples. Most conventional open information extraction approaches leverage a series of manual pre-defined extraction patterns or learn patterns from labeled training examples, which requires a large number of human resources. Additionally, many Natural Language Processing tools are involved, which leads to error accumulation and propagation. With the rapid development of neural networks, neural-based models can minimize the error propagation problem, but it also faces the problem of data-hungry in supervised learning. Especially, they leverage existing Open IE tools to generate training data, and it causes data quality issues. In this paper, we employ a distant supervision learning approach to improve the Open IE task. We conduct extensive experiments by employing two popular sequence-to-sequence models (RNN and Transformer) and a large benchmark data set to demonstrate the performance of our approach.
引用
收藏
页码:3287 / 3306
页数:20
相关论文
共 49 条
  • [1] Angeli G, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P344
  • [2] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [3] Cetto M., 2018, P 27 INT C COMP LING, P2300
  • [4] Cho K, 2014, ARXIV14061078, P1724, DOI DOI 10.3115/V1/D14-1179
  • [5] Christensen J, 2010, P NAACL HLT 2010 1 I
  • [6] Cui L, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P407
  • [7] Del Corro L., 2013, P 22 INT C WORLD WID, P355, DOI [DOI 10.1145/2488388.2488420, /10.1145/2488388.2488420]
  • [8] Open Information Extraction from the Web
    Etzioni, Oren
    Banko, Michele
    Soderland, Stephen
    Weld, Daniel S.
    [J]. COMMUNICATIONS OF THE ACM, 2008, 51 (12) : 68 - 74
  • [9] Etzioni Oren, 2011, Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume One, P3
  • [10] Fader A., 2011, EMNLP 2011, P1535