CluEval: A Python']Python tool for evaluating clustering performance in named entity disambiguation

被引:1
作者
Kim, Jinseok [1 ,2 ]
Kim, Jenna [3 ]
机构
[1] Univ Michigan, Inst Social Res, Survey Res Ctr, 330 Packard St, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Sch Informat, 105 S State St, Ann Arbor, MI 48109 USA
[3] Univ Illinois, Sch Informat Sci, 501 E Daniel St, Champaign, IL USA
基金
美国国家科学基金会;
关键词
Named entity disambiguation; Author name disambiguation; Clustering; B-cubed; Pairwise F; K-metric;
D O I
10.1016/j.simpa.2023.100510
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
CluEval is a Python tool designed to assess the accuracy of named entity disambiguation methods in clustering performance. It allows users to employ five commonly used clustering evaluation metrics in entity disambiguation research. With newly developed, comprehensive, and fast algorithms, CluEval can handle large-scale computations and help users better understand and systematically interpret the clustering performance of named entity disambiguation methods. This tool can serve as a stand-alone evaluation code or be integrated as a module into any named entity disambiguation framework that produces clusters as the final disambiguation outputs, such as author name disambiguation.
引用
收藏
页数:3
相关论文
共 13 条
[1]   Lattice-based progressive author disambiguation [J].
Backes, Tobias ;
Dietze, Stefan .
INFORMATION SYSTEMS, 2022, 109
[2]  
Du HL, 2019, INT CONF DAT MIN WOR, P1037, DOI 10.1109/ICDMW.2019.00150
[3]   A Brief Survey of Automatic Methods for Author Name Disambiguation [J].
Ferreira, Anderson A. ;
Goncalves, Marcos Andre ;
Laender, Alberto H. F. .
SIGMOD RECORD, 2012, 41 (02) :15-26
[4]   ORCID-linked labeled data for evaluating author name disambiguation at scale [J].
Kim, Jinseok ;
Owen-Smith, Jason .
SCIENTOMETRICS, 2021, 126 (03) :2057-2083
[5]   Model Reuse in Machine Learning for Author Name Disambiguation: An Exploration of Transfer Learning [J].
Kim, Jinseok ;
Owen-Smith, Jason .
IEEE ACCESS, 2020, 8 :188378-188389
[6]   A fast and integrative algorithm for clustering performance evaluation in author name disambiguation [J].
Kim, Jinseok .
SCIENTOMETRICS, 2019, 120 (02) :661-681
[7]   Distortive Effects of Initial-Based Name Disambiguation on Measurements of Large-Scale Coauthorship Networks [J].
Kim, Jinseok ;
Diesner, Jana .
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (06) :1446-1461
[8]  
Menestrina D, 2010, PROC VLDB ENDOW, V3, P208, DOI 10.14778/1920841.1920871
[9]   Exploiting Higher Order Multi-dimensional Relationships with Self-attention for Author Name Disambiguation [J].
Pooja, K. M. ;
Mondal, Samrat ;
Chandra, Joydeep .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (05)
[10]   Exploiting similarities across multiple dimensions for author name disambiguation [J].
Pooja, KM. ;
Mondal, Samrat ;
Chandra, Joydeep .
SCIENTOMETRICS, 2021, 126 (09) :7525-7560