Improving Open Information Extraction with Distant Supervision Learning

被引：3

作者：

Han, Jiabao ^{[1
]}

Wang, Hongzhi ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2021年 / 53卷 / 05期

关键词：

Distant supervision learning; Open information extraction; Neural network; Sequence-to-sequence model;

D O I：

10.1007/s11063-021-10548-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Open information extraction (Open IE), as one of the essential applications in the area of Natural Language Processing (NLP), has gained great attention in recent years. As a critical technology for building Knowledge Bases (KBs), it converts unstructured natural language sentences into structured representations, usually expressed in the form of triples. Most conventional open information extraction approaches leverage a series of manual pre-defined extraction patterns or learn patterns from labeled training examples, which requires a large number of human resources. Additionally, many Natural Language Processing tools are involved, which leads to error accumulation and propagation. With the rapid development of neural networks, neural-based models can minimize the error propagation problem, but it also faces the problem of data-hungry in supervised learning. Especially, they leverage existing Open IE tools to generate training data, and it causes data quality issues. In this paper, we employ a distant supervision learning approach to improve the Open IE task. We conduct extensive experiments by employing two popular sequence-to-sequence models (RNN and Transformer) and a large benchmark data set to demonstrate the performance of our approach.

引用

页码：3287 / 3306

页数：20

共 49 条

[1] Angeli G, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P344
[2] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3] Cetto M., 2018, P 27 INT C COMP LING, P2300
[4] Cho K, 2014, ARXIV14061078, P1724, DOI DOI 10.3115/V1/D14-1179
[5] Christensen J, 2010, P NAACL HLT 2010 1 I
[6] Cui L, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P407
[7] Del Corro L., 2013, P 22 INT C WORLD WID, P355, DOI [DOI 10.1145/2488388.2488420, /10.1145/2488388.2488420]
[8] Open Information Extraction from the Web
Etzioni, Oren
Banko, Michele
Soderland, Stephen
Weld, Daniel S.
[J]. COMMUNICATIONS OF THE ACM, 2008, 51 (12) : 68 - 74
[9] Etzioni Oren, 2011, Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume One, P3
[10] Fader A., 2011, EMNLP 2011, P1535

← 1 2 3 4 5 →