XLORE 3: A Large-Scale Multilingual Knowledge Graph from Heterogeneous Wiki Knowledge Resources

被引:0
作者
Zeng, Kaisheng [1 ]
Jin, Hailong [1 ]
Lv, Xin [1 ]
Zh, Fangwei [2 ]
Hou, Lei [1 ]
Zhang, Yi [1 ]
Pang, Fan [1 ]
Qi, Yu [1 ]
Liu, Dingxiao [1 ]
Li, Juanzi [1 ]
Feng, Ling [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Peking Univ, Beijing, Peoples R China
关键词
Additional Key Words and Phrases; Knowledge graph; knowledge management; knowledge fusion; knowledge completion; schema construction; entity typing; entity alignment; entity linking; ENTITY; CONSTRUCTION;
D O I
10.1145/3660521
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, knowledge graph (KG) has attracted significant attention from academia and industry, resulting in the development of numerous technologies for KG construction, completion, and application. XLORE is one of the largest multilingual KGs built from Baidu Baike and Wikipedia via a series of knowledge modeling and acquisition methods. In this article, we utilize systematic methods to improve XLORE's data quality and present its latest version, XLORE 3, which enables the effective integration and management of heterogeneous knowledge from diverse resources. Compared with previous versions, XLORE 3 has three major advantages: (1) We design a comprehensive and reasonable schema, namely XLORE ontology, which can effectively organize and manage entities from various resources. (2) We merge equivalent entities in different languages to facilitate knowledge sharing. We provide a large-scale entity linking system to establish the associations between unstructured text and structured KG. (3) We design a multi-strategy knowledge completion framework, which leverages pre-trained language models and vast amounts of unstructured text to discover missing and new facts. The resulting KG contains 446 concepts, 2,608 properties, 66 million entities, and more than 2 billion facts. It is available and downloadable online at https://www.xlore.cn/, providing a valuable resource for researchers and practitioners in various fields.
引用
收藏
页数:47
相关论文
共 149 条
[11]  
Brown TB, 2020, ADV NEUR IN, V33
[12]  
Bruna J, 2014, 2 INT C LEARN REPR I, DOI DOI 10.48550/ARXIV.1312.6203
[13]  
Cao BX, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), P1860
[14]  
Cao SL, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P6101
[15]  
Carlson A, 2010, AAAI CONF ARTIF INTE, P1306
[16]   Multi-modal Siamese Network for Entity Alignment [J].
Chen, Liyi ;
Li, Zhi ;
Xu, Tong ;
Wu, Han ;
Wang, Zhefeng ;
Yuan, Nicholas Jing ;
Chen, Enhong .
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, :118-126
[17]  
Chen MH, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1511
[18]  
Chen W.-Y., 2019, P INT C LEARN REPR I
[19]   Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions Over Knowledge Graphs [J].
Chen, Yongrui ;
Li, Huiying ;
Qi, Guilin ;
Wu, Tianxing ;
Wang, Tenggou .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) :8343-8357
[20]  
Choi E, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P87