A knowledge graph embeddings based approach for author name disambiguation using literals

被引:0
作者
Cristian Santini
Genet Asefa Gesese
Silvio Peroni
Aldo Gangemi
Harald Sack
Mehwish Alam
机构
[1] FIZ Karlsruhe – Leibniz Institute for Information Infrastructure,
[2] University of Bologna,undefined
[3] Karlsruhe Institute of Technology,undefined
[4] Institute AIFB,undefined
来源
Scientometrics | 2022年 / 127卷
关键词
Author Name Disambiguation; Bibliographic data; Citation data; Clustering; Knowledge graph embeddings; Open citations;
D O I
暂无
中图分类号
学科分类号
摘要
Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the form of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: (1) multimodal KGEs, (2) a blocking procedure, and finally, (3) hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8–14% in terms of F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://doi.org/10.5281/zenodo.6309855) respectively.
引用
收藏
页码:4887 / 4912
页数:25
相关论文
共 67 条
[1]  
Ali M(2021)PyKEEN 1.0: A Python library for training and evaluating knowledge graph embeddings Journal of Machine Learning Research 22 1-6
[2]  
Berrendorf M(2016)Research-paper recommender systems: A literature survey International Journal on Digital Libraries 17 305-338
[3]  
Hoyt CT(2022)(Almost) all of entity resolution Science Advances 8 eabi8021-1870
[4]  
Beel J(2021)Name disambiguation based on graph convolutional network Scientific Programming 2021 e5577-1416
[5]  
Gipp B(2010)An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations Journal of the American Society for Information Science and Technology 61 1853-23
[6]  
Langer S(1946)Record linkage American Journal of Public Health and the Nations Health 36 1412-405
[7]  
Binette O(2011)On graph-based name disambiguation Journal of Data and Information Quality 2 1-1210
[8]  
Steorts RC(2022)The Microsoft academic knowledge graph enhanced: author name disambiguation, publication classification, and embeddings Quantitative Science Studies 21 375-26
[9]  
Chen Y(2020)Citation recommendation: Approaches and datasets International Journal on Digital Libraries 64 1183-647
[10]  
Yuan H(1969)A theory for record linkage Journal of the American Statistical Association 41 15-83