Completing features for author name disambiguation (AND): an empirical analysis

被引:0
作者
Humaira Waqas
Abdul Qadir
机构
[1] Capital University of Science and Technology,
来源
Scientometrics | 2022年 / 127卷
关键词
Digital libraries; Author name disambiguation; AND; AND datasets;
D O I
暂无
中图分类号
学科分类号
摘要
This study presents a feature enriched AND dataset to develop diverse and better performance achieving AND techniques, by utilizing AND features which have better discriminating abilities to solve this problem. Current AND datasets have limited number of useful AND features in them, some of them have been curated keeping in mind specific scenarios or contexts and some of them are domain specific. Rather than limiting the labelled datasets to be domain specific, contextual or hold limited feature values, it is better to leave their usage limit as a choice with respect to the technique which is trying to solve this problem. In this paper, our proposed labelled dataset “CustAND” provides a set of 7886 publication records, where each record covers more than eleven useful features values. The dataset covers multi domains as well as different ethnical group authors. CustAND is collected from multiple web sources, where raw data is extracted from digital libraries and search engines. This data is later cross checked, hand labelled and confirmed (authorship confirmation) by a team of graduate students with 100% accuracy. The raw data after pre-processing is validated by checking author’s personal web pages, different profile pages, their affiliations, and emails. This new dataset complements the availability of useful feature values which are crucial in developing generic and better performance achieving techniques to solve the author’s name ambiguity problem generally faced by the digital libraries.
引用
收藏
页码:1039 / 1063
页数:24
相关论文
共 58 条
[1]  
Altman DG(1996)Statistics notes: Detecting skewness from summary information BMJ 313 1200-46
[2]  
Bland JM(1960)A coefficient of agreement for nominal scales Educational and Psychological Measurement 20 37-1870
[3]  
Cohen J(2010)An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations Journal of the Association for Information Science and Technology 61 1853-146
[4]  
Cota RG(2020)Automatic disambiguation of author names in bibliographic repositories Synthesis Lectures on Information Concepts, Retrieval, and Services 12 1-1278
[5]  
Ferreira AA(2014)Self-training author name disambiguation for information scarce scenarios Journal of the Association for Information Science and Technology 65 1257-465
[6]  
Nascimento C(2011)Construction of a large-scale test set for author disambiguation Information Processing and Management 47 452-855
[7]  
Gonçalves MA(2020)Effect of forename string on author name disambiguation Journal of the Association for Information Science and Technology 71 839-280
[8]  
Laender AHF(2019)Generating automatically labeled data for author name disambiguation: An iterative clustering method Scientometrics 118 253-2083
[9]  
Ferreira AA(2021)ORCID-linked labeled data for evaluating author name disambiguation at scale Scientometrics 126 2057-1047
[10]  
Gonçalves MA(2012)Citation-based bootstrapping for large-scale author disambiguation Journal of the American Society for Information Science and Technology 63 1030-282