AuthCrowd: Author Name Disambiguation and Entity Matching using Crowdsourcing

被引:3
作者
Correia, Antonio [1 ,2 ]
Guimaraes, Diogo [1 ,2 ]
Paulino, Dennis [1 ,2 ]
Jameel, Shoaib [3 ]
Schneider, Daniel [4 ]
Fonseca, Benjamim [1 ,2 ]
Paredes, Hugo [1 ,2 ]
机构
[1] INESC TEC, Apartado 1013, Vila Real, Portugal
[2] Univ Tras Os Montes & Alto Douro, UTAD, Apartado 1013, Vila Real, Portugal
[3] Univ Essex, Sch Comp Sci & Elect Engn, Colchester Campus, Colchester, Essex, England
[4] NCE UFRJ, Tercio Pacitti Inst Comp Applicat & Res, Rio De Janeiro, Brazil
来源
PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD) | 2021年
关键词
author name disambiguation; crowdsourcing; entity matching; evaluation; scientometrics; task design;
D O I
10.1109/CSCWD49262.2021.9437769
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Despite decades of research and development in named entity resolution, dealing with name ambiguity is still a challenging issue for many bibliometric-enhanced information retrieval (IR) tasks. As new bibliographic datasets are created as a result of the upward growth of publication records worldwide, more problems arise when considering the effects of errors resulting from missing data fields, duplicate entities, misspellings, extra characters, etc. As these concerns tend to be of large-scale, both the general consistency and the quality of electronic data are largely affected. This paper presents an approach to handle these name ambiguity problems through the use of crowdsourcing as a complementary means to traditional unsupervised approaches. To this end, we present "AuthCrowd", a crowdsourcing system with the ability to decompose named entity disambiguation and entity matching tasks. Experimental results on a real-world dataset of publicly available papers published in peer-reviewed venues demonstrate the potential of our proposed approach for improving author name disambiguation. The findings further highlight the importance of adopting hybrid crowd-algorithm collaboration strategies, especially for handling complexity and quantifying bias when working with large amounts of data.
引用
收藏
页码:150 / 155
页数:6
相关论文
共 19 条
[1]   General framework, opportunities and challenges for crowdsourcing techniques: A Comprehensive survey [J].
Bhatti, Shahzad Sarwar ;
Gao, Xiaofeng ;
Chen, Guihai .
JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 167
[2]   Bootstrapping Active Name Disambiguation with Crowdsourcing [J].
Cheng, Yu ;
Chen, Zhengzhang ;
Wang, Jiang ;
Agrawal, Ankit ;
Choudhary, Alok .
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, :1213-1216
[3]  
Chin J. P., 1988, P SIGCHI C HUM FACT, P213
[4]  
Correia A, P IEEE INT C BIG DAT, P2876
[5]  
Correia A, 2019, INT C COMP SUPP COOP, P129, DOI [10.1109/cscwd.2019.8791855, 10.1109/CSCWD.2019.8791855]
[6]  
Cucerzan S., 2007, P 2007 JOINT C EMP M, V2007, P708
[7]   Collecting large-scale publication data at the level of individual researchers: a practical proposal for author name disambiguation [J].
D'Angelo, Ciriaco Andrea ;
van Eck, Nees Jan .
SCIENTOMETRICS, 2020, 123 (02) :883-907
[8]   A survey of author name disambiguation techniques: 2010-2016 [J].
Hussain, Ijaz ;
Asghar, Sohail .
KNOWLEDGE ENGINEERING REVIEW, 2017, 32
[9]   A taxonomy of crowdsourcing based on task complexity [J].
Nakatsu, Robbie T. ;
Grossman, Elissa B. ;
Iacovou, Charalambos L. .
JOURNAL OF INFORMATION SCIENCE, 2014, 40 (06) :823-834
[10]   A Graph Combination With Edge Pruning-Based Approach for Author Name Disambiguation [J].
Pooja, K. M. ;
Mondal, Samrat ;
Chandra, Joydeep .
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2020, 71 (01) :69-83