Revisiting the Evaluation Protocol of Knowledge Graph Completion Methods for Link Prediction

被引：6

作者：

Tiwari, Sudhanshu ^{[1
]}

Bansal, Iti ^{[1
]}

Rivero, Carlos R. ^{[1
]}

机构：

[1] Rochester Inst Technol, Rochester, NY 14623 USA

来源：

PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) | 2021年

关键词：

Knowledge graph completion; Link prediction; Benchmarking; Evaluation protocol; Metrics; Data anomalies;

D O I：

10.1145/3442381.3449856

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Completion methods learn models to infer missing (subject, predicate, object) triples in knowledge graphs, a task known as link prediction. The training phase is based on samples of positive triples and their negative counterparts. The test phase consists of ranking each positive triple with respect to its negative counterparts based on the scores obtained by a learned model. The best model ranks all positive triples first. Metrics like mean rank, mean reciprocal rank and hits at k are used to assess accuracy. Under this generic evaluation protocol, we observe several shortcomings: 1) Current metrics assume that each measurement is upper bounded by the same constant value and, therefore, are oblivious to the fact that, in link prediction, each positive triple may have a different number of negative counterparts, which alters the difficulty of ranking positive triples. 2) Benchmarking datasets contain anomalies (unrealistic redundancy) that allegedly simplifies link prediction; however, current instantiations of the generic evaluation protocol do not integrate anomalies, which are just discarded based on a user-defined threshold. 3) Benchmarking datasets have been randomly split, which typically alters the graph topology and results in the training split not resembling the original dataset. 4) A single model is typically kept based on its accuracy over the validation split using a given metric; however, since metrics aggregate ranks into a single value, there may be no significant differences among the ranks produced by several models, which must be all evaluated in the test phase. In this paper, we contribute to the evaluation of link prediction as follows: 1) We propose a variation of the mean rank that considers the number of negative counterparts. 2) We define the anomaly coefficient of a predicate and integrate such coefficient in the protocol. 3) We propose a downscaling algorithm to generate training splits that reflect the original graph topology based on a nonparametric, unpaired statistical test. 4) During validation, we discard a learned model only if its output ranks are significantly different than other ranks based on a nonparametric, paired statistical test. Our experiments over seven well-known datasets show that translation-based methods (TransD, TransE and TransH) significantly outperform recent methods, which entails that our understanding of the accuracy of completion methods for link prediction is far from perfect.

引用

页码：809 / 820

页数：12

共 52 条

[1] Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study [J].

Akrami, Farahnaz ;

Saeef, Mohammed Samiul ;

Zhang, Qingheng ;

Hu, Wei ;

Li, Chengkai .

SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, :1995-2010

[2]

Ali M., 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence

[3]

Ali Mehdi, 2020, ABS200714175 CORR

[4]

[Anonymous], 2017, SEMANT WEB

[5] AYNEC: All You Need for Evaluating Completion Techniques in Knowledge Graphs [J].

Ayala, Daniel ;

Borrego, Agustin ;

Hernandez, Inma ;

Rivero, Carlos R. ;

Ruiz, David .

SEMANTIC WEB, ESWC 2019, 2019, 11503 :397-411

[6] The Impact of Negative Triple Generation Strategies and Anomalies on Knowledge Graph Completion [J].

Bansal, Iti ;

Tiwari, Sudhanshu ;

Rivero, Carlos R. .

CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :45-54

[7]

Berrendorf Max, 2020, CORRABS200206914

[8]

Bollacker K, 2008, P 2008 ACM SIGMOD IN, P1247, DOI 10.1145/1376616.1376746

[9]

Bordes A., 2013, ADV NEURAL INFORM PR, V26, P2787, DOI DOI 10.5555/2999792.2999923

[10] Constructing and Mining Web-Scale Knowledge Graphs [J].

Bordes, Antoine ;

Gabrilovich, Evgeniy .

PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :1967-1967

← 1 2 3 4 5 6 →