FoodKG: A Tool to Enrich Knowledge Graphs Using Machine Learning Techniques

被引:18
作者
Gharibi, Mohamed [1 ]
Zachariah, Arun [2 ]
Rao, Praveen [2 ,3 ]
机构
[1] Univ Missouri, Dept Comp Sci & Elect Engn, Kansas City, MO 64110 USA
[2] Univ Missouri, Dept Elect Engn & Comp Sci, Columbia, MO USA
[3] Univ Missouri, Dept Hlth Management & Informat, Columbia, MO USA
来源
FRONTIERS IN BIG DATA | 2020年 / 3卷
基金
美国国家科学基金会;
关键词
machine learning; graph embeddings; knowledge graphs; AGROVOC; semantic similarity; WEB;
D O I
10.3389/fdata.2020.00012
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While there exist a plethora of datasets on the Internet related to Food, Energy, and Water (FEW), there is a real lack of reliable methods and tools that can consume these resources. This hinders the development of novel decision-making applications utilizing knowledge graphs. In this paper, we introduce a novel software tool, called FoodKG, that enriches FEW knowledge graphs using advanced machine learning techniques. Our overarching goal is to improve decision-making and knowledge discovery as well as to provide improved search results for data scientists in the FEW domains. Given an input knowledge graph (constructed on raw FEW datasets), FoodKG enriches it with semantically related triples, relations, and images based on the original dataset terms and classes. FoodKG employs an existing graph embedding technique trained on a controlled vocabulary called AGROVOC, which is published by the Food and Agriculture Organization of the United Nations. AGROVOC includes terms and classes in the agriculture and food domains. As a result, FoodKG can enhance knowledge graphs with semantic similarity scores and relations between different classes, classify the existing entities, and allow FEW experts and researchers to use scientific terms for describing FEW concepts. The resulting model obtained after training on AGROVOC was evaluated against the state-of-the-art word embedding and knowledge graph embedding models that were trained on the same dataset. We observed that this model outperformed its competitors based on the Spearman Correlation Coefficient score.
引用
收藏
页数:12
相关论文
共 49 条
[1]  
[Anonymous], 2017, Transactions of the Association for Computational Linguistics
[2]  
[Anonymous], 2018, P NAACL HLT
[3]   DBpedia: A nucleus for a web of open data [J].
Auer, Soeren ;
Bizer, Christian ;
Kobilarov, Georgi ;
Lehmann, Jens ;
Cyganiak, Richard ;
Ives, Zachary .
SEMANTIC WEB, PROCEEDINGS, 2007, 4825 :722-+
[4]   The AGROVOC Linked Dataset [J].
Caracciolo, Caterina ;
Stellato, Armando ;
Morshed, Ahsan ;
Johannsen, Gudrun ;
Rajbhandari, Sachit ;
Jaques, Yves ;
Keizer, Johannes .
SEMANTIC WEB, 2013, 4 (03) :341-348
[5]  
Chen D., 2014, P 2014 C EMP METH NA, P740, DOI [10.3115/v1/D14-1082, DOI 10.3115/V1/D14-1082]
[6]  
Chen H., 2018, ARTIF INTELL, P213
[7]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[8]  
2-9
[9]   EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs [J].
Dubey, Mohnish ;
Banerjee, Debayan ;
Chaudhuri, Debanjan ;
Lehmann, Jens .
SEMANTIC WEB - ISWC 2018, PT I, 2018, 11136 :108-126
[10]   HighLife: Higher-arity Fact Harvesting [J].
Ernst, Patrick ;
Siu, Amy ;
Weikum, Gerhard .
WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, :1013-1022