ProMvSD: Towards unsupervised knowledge graph anomaly detection via prior knowledge integration and multi-view semantic-driven estimation

被引：4

作者：

Zhou, Yunfeng ^{[1
]}

Zhu, Cui ^{[1
]}

Zhu, Wenjun ^{[1
]}

机构：

[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2024年 / 61卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Knowledge graph; Anomaly detection; Pre-trained language models; Semantics; Unsupervised learning;

D O I：

10.1016/j.ipm.2024.103705

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Knowledge graphs (KGs) have found extensive applications within intelligent systems, such as information retrieval. Much of the research has predominantly focused on completing missing knowledge, with little consideration given to examining errors. Unfortunately, during customizing KGs, diverse unpredictable errors are virtually unavoidable to be introduced, and these anomalies significantly impact the performance of applications. Detecting erroneous knowledge presents a formidable challenge due to the costly acquisition of ground -truth labels. In this work, we develop an unsupervised anomaly detection framework named ProMvSD, aiming to adapt KGs of varying scales via serialization components. To overcome the insufficient contextual information provided by the topological structure, we introduce the large language model as a reasoner to extract prior knowledge from extensive pre -trained textual data, thereby enhancing the understanding of KGs. Anomalous triple may result in a larger semantic gap between the head and tail neighborhoods. To uncover latent anomalies effectively, we propose a multi -view semantic -driven model (MvSD) based on the assumptions of self -consistency and information stability. MvSD jointly estimates the suspiciousness of triples from three hyperviews: node -view semantic contradiction, triple -view semantic gap, and pathway -view semantic gap. Extensive experiments on three English benchmark KGs and a Chinese medical KG demonstrate that, for the top 1% of the most suspicious triples, we can detect real anomalies with at most 99.9% accuracy. Furthermore, ProMvSD significantly outperforms state-of-the-art representation learning baselines, achieving a 29.2% improvement in detecting all anomalies.

引用

页数：20

共 60 条

[1] DBpedia: A nucleus for a web of open data [J].

Auer, Soeren ;

Bizer, Christian ;

Kobilarov, Georgi ;

Lehmann, Jens ;

Cyganiak, Richard ;

Ives, Zachary .

SEMANTIC WEB, PROCEEDINGS, 2007, 4825 :722-+

[2] What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization [J].

Belth, Caleb ;

Zheng, Xinyi ;

Vreeken, Jilles ;

Koutra, Danai .

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, :1115-1126

[3]

Bollacker Kurt, 2008, P 2008 ACM SIGMOD IN, DOI DOI 10.1145/1376616.1376746

[4]

Bordes Antoine, 2013, ADV NEURAL INFORM PR, P2787, DOI DOI 10.5555/2999792.2999923

[5]

Carlson A, 2010, AAAI CONF ARTIF INTE, P1306

[6]

Chung HW, 2022, Arxiv, DOI [arXiv:2210.11416, DOI 10.48550/ARXIV.2210.11416]

[7]

Dettmers T, 2018, AAAI CONF ARTIF INTE, P1811

[8]

Devlin J, 2019, Arxiv, DOI arXiv:1810.04805

[9]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[10]

Dong Junnan, 2023, P WSDM, P877

← 1 2 3 4 5 6 →