ANDez: An open-source tool for author name disambiguation using machine learning

被引:0
作者
Kim, Jinseok [1 ,2 ]
Kim, Jenna [3 ]
机构
[1] Univ Michigan, Inst Social Res, 330 Packard St, Ann Arbor, MI 48104 USA
[2] Univ Michigan, Sch Informat, 330 Packard St, Ann Arbor, MI 48104 USA
[3] Univ Illinois, Sch Informat Sci, 501 E Daniel St, Champaign, IL 61820 USA
基金
美国国家科学基金会;
关键词
Author name disambiguation; Authority control; Machine learning; Science of science; Scientometrics; Bibliometrics;
D O I
10.1016/j.softx.2024.101719
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Author name disambiguation in bibliographic data is challenging due to the same names of different authors and name variations of authors. Various machine learning (ML) methods address this, but a unified framework for comparing them is lacking. This study introduces ANDez, an open-source tool that integrates top-performing ML techniques for author name disambiguation. Developed in Python using popular ML libraries, ANDez provides a transparent system, merging complex procedures from different ML approaches. This promotes the assessment, modification, and benchmarking of ML techniques in author name disambiguation. ANDez's user-friendly design also helps researchers analyze ambiguous bibliographic data without needing advanced ML coding expertise.
引用
收藏
页数:7
相关论文
共 37 条
[1]   ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions [J].
Albert, Paul J. ;
Dutta, Sarbajit ;
Lin, Jie ;
Zhu, Zimeng ;
Bales, Michael ;
Johnson, Stephen B. ;
Mansour, Mohammad ;
Wright, Drew ;
Wheeler, Terrie R. ;
Cole, Curtis L. .
PLOS ONE, 2021, 16 (04)
[2]   The Impact of Name-Matching and Blocking on Author Disambiguation [J].
Backes, Tobias .
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, :803-812
[3]   Has Large-Scale Named-Entity Network Analysis Been Resting on a Flawed Assumption? [J].
Fegley, Brent D. ;
Torvik, Vetle I. .
PLOS ONE, 2013, 8 (07)
[4]   A Brief Survey of Automatic Methods for Author Name Disambiguation [J].
Ferreira, Anderson A. ;
Goncalves, Marcos Andre ;
Laender, Alberto H. F. .
SIGMOD RECORD, 2012, 41 (02) :15-26
[5]   Two supervised learning approaches for name disambiguation in author citations [J].
Han, H ;
Giles, L ;
Zha, H ;
Li, C ;
Tsioutsiouliklis, K .
JCDL 2004: PROCEEDINGS OF THE FOURTH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: GLOBAL REACH AND DIVERSE IMPACT, 2004, :296-305
[6]  
Han H., 2005, P 2005 ACM S APPL CO, P1065, DOI [DOI 10.1145/1066677.1066920, 10.1145/1066677.1066920]
[7]  
Huang J, 2006, Efficient Name Disambiguation for Large-Scale Databases
[8]   A survey of author name disambiguation techniques: 2010-2016 [J].
Hussain, Ijaz ;
Asghar, Sohail .
KNOWLEDGE ENGINEERING REVIEW, 2017, 32
[9]   CluEval: A Python']Python tool for evaluating clustering performance in named entity disambiguation [J].
Kim, Jinseok ;
Kim, Jenna .
SOFTWARE IMPACTS, 2023, 16
[10]   Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation [J].
Kim, Jinseok ;
Kim, Jenna ;
Kim, Jinmo .
JOURNAL OF INFORMATION SCIENCE, 2023, 49 (03) :711-725