Sar-graphs: A language resource connecting linguistic knowledge with semantic relations from knowledge graphs

被引:12
作者
Krause, Sebastian [1 ]
Hennig, Leonhard [1 ]
Moro, Andrea [2 ]
Weissenborn, Dirk [1 ]
Xu, Feiyu [1 ]
Uszkoreit, Hans [1 ]
Navigli, Roberto [2 ]
机构
[1] DFKI Language Technol Lab, Alt Moabit 91c, D-10559 Berlin, Germany
[2] Univ Roma La Sapienza, Dipartimento Informat, Viale Regina Elena 295, I-00161 Rome, Italy
来源
JOURNAL OF WEB SEMANTICS | 2016年 / 37-38卷
关键词
Knowledge graphs; Language resources; Linguistic patterns; Relation extraction; WIKIPEDIA; ONTOLOGY;
D O I
10.1016/j.websem.2016.03.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have seen a significant growth and increased usage of large-scale knowledge resources in both academic research and industry. We can distinguish two main types of knowledge resources: those that store factual information about entities in the form of semantic relations (e.g., Freebase), namely so-called knowledge graphs, and those that represent general linguistic knowledge (e.g., WordNet or UWN). In this article, we present a third type of knowledge resource which completes the picture by connecting the two first types. Instances of this resource are graphs of semantically-associated relations (sar-graphs), whose purpose is to link semantic relations from factual knowledge graphs with their linguistic representations in human language. We present a general method for constructing sar-graphs using a language-and relation-independent, distantly supervised approach which, apart from generic language processing tools, relies solely on the availability of a lexical semantic resource, providing sense information for words, as well as a knowledge base containing seed relation instances. Using these seeds, our method extracts, validates and merges relation-specific linguistic patterns from text to create sar-graphs. To cope with the noisily labeled data arising in a distantly supervised setting, we propose several automatic pattern confidence estimation strategies, and also show how manual supervision can be used to improve the quality of sar-graph instances. We demonstrate the applicability of our method by constructing sar-graphs for 25 semantic relations, of which we make a subset publicly available at http://sargraph.dfki.de. We believe sar-graphs will prove to be useful linguistic resources for a wide variety of natural language processing tasks, and in particular for information extraction and knowledge base population. We illustrate their usefulness with experiments in relation extraction and in computer assisted language learning. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:112 / 131
页数:20
相关论文
共 65 条
  • [1] Alfonseca E., 2013, P ACL
  • [2] [Anonymous], 2014, Transactions of the Association for Computational Linguistics, DOI [10.1162/tacl_a_00179, DOI 10.1162/TACL_A_00179]
  • [3] [Anonymous], 2011, English Gigaword
  • [4] [Anonymous], 2011, Proc. EMNLP
  • [5] [Anonymous], SYNTHESIS LECT HUMAN
  • [6] Callan J., 2009, The ClueWeb09 Dataset
  • [7] Carlson A., 2010, P AAAI
  • [8] Chiarcos C., 2012, Linked data in linguistics
  • [9] de Marneffe Marie-Catherine., 2008, Stanford dependencies
  • [10] DODDINGTON G, 2004, P LREC