Improving Extraction of Chinese Open Relations Using Pre-trained Language Model and Knowledge Enhancement

被引:0
作者
Wen, Chaojie [1 ]
Jia, Xudong [1 ]
Chen, Tao [1 ]
机构
[1] WuYi Univ, Fac Intelligent Mfg, Jiangmen, Guangdong, Peoples R China
关键词
Chinese open relation extraction; Pre-trained language model; Knowledge enhancement; OPEN INFORMATION EXTRACTION;
D O I
10.1162/dint_a_00227
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Open Relation Extraction (ORE) is a task of extracting semantic relations from a text document. Current ORE systems have significantly improved their efficiency in obtaining Chinese relations, when compared with conventional systems which heavily depend on feature engineering or syntactic parsing. However, the ORE systems do not use robust neural networks such as pre-trained language models to take advantage of large-scale unstructured data effectively. In respons to this issue, a new system entitled Chinese Open Relation Extraction with Knowledge Enhancement (CORE-KE) is presented in this paper. The CORE-KE system employs a pre-trained language model (with the support of a Bidirectional Long Short-Term Memory (BiLSTM) layer and a Masked Conditional Random Field (Masked CRF) layer) on unstructured data in order to improve Chinese open relation extraction. Entity descriptions in Wikidata and additional knowledge (in terms of triple facts) extracted from Chinese ORE datasets are used to fine-tune the pre-trained language model. In addition, syntactic features are further adopted in the training stage of the CORE-KE system for knowledge enhancement. Experimental results of the CORE-KE system on two large-scale datasets of open Chinese entities and relations demonstrate that the CORE-KE system is superior to other ORE systems. The F1-scores of the CORE-KE system on the two datasets have given a relative improvement of 20.1% and 1.3%, when compared with benchmark ORE systems, respectively. The source code is available at https://github.com/cjwen15/CORE-KE.
引用
收藏
页码:962 / 989
页数:28
相关论文
共 51 条
  • [31] Pawar S, 2017, Arxiv, DOI arXiv:1712.05191
  • [32] [秦兵 Qin Bing], 2015, [计算机研究与发展, Journal of Computer Research and Development], V52, P1029
  • [33] Qiu L., 2014, ZORE: A syntax-based system for chinese open relation extraction, P1870
  • [34] RATCLIFF JW, 1988, DR DOBBS J, V13, P46
  • [35] Roy A, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P728
  • [36] Saha S., 2018, P 27 INT C COMP LING
  • [37] Bootstrapping for Numerical Open IE
    Saha, Swarnadeep
    Pal, Harinder
    Mausam
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 317 - 323
  • [38] Schmitz M., 2012, Open language learning for information extraction, P523
  • [39] Get To The Point: Summarization with Pointer-Generator Networks
    See, Abigail
    Liu, Peter J.
    Manning, Christopher D.
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1073 - 1083
  • [40] Shen Y., 2018, ORDERED NEURONS INTE