A Web Service for Author Name Disambiguation in Scholarly Databases

被引:17
作者
Kim, Kunho [1 ]
Sefid, Athar [1 ]
Weinberg, Bruce A. [3 ]
Giles, C. Lee [1 ,2 ]
机构
[1] Penn State Univ, Comp Sci & Engn, University Pk, PA 16801 USA
[2] Penn State Univ, Informat Sci & Technol, University Pk, PA 16801 USA
[3] Ohio State Univ, Dept Econ, Columbus, OH 43210 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (IEEE ICWS 2018) | 2018年
基金
美国国家科学基金会;
关键词
Web services; search; PubMed; author name disambiguation; PRODUCTIVITY;
D O I
10.1109/ICWS.2018.00041
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Author Name Disambiguation (AND) is the task of clustering unique author names from publication records in scholarly or related databases. Although AND has been extensively studied and has served as an important preprocessing step for several tasks (e.g. calculating bibliometrics and scientometrics for authors), there are few publicly available tools for disambiguation in large-scale scholarly databases. Furthermore, most of the disambiguated data is embedded within the search engines of the scholarly databases, and existing application programming interfaces (APIs) have limited features and are often unavailable for users for various reasons. This makes it difficult for researchers and developers to use the data for various applications (e.g. author search) or research. Here, we design a novel, web-based, RESTful API for searching disambiguated authors, using the PubMed database as a sample application. We offer two type of queries, attribute-based queries and record based queries which serve different purposes. Attribute-based queries retrieve authors with the attributes available in the database. We study different search engines to find the most appropriate one for processing attribute-based queries. Record based queries retrieve authors that are most likely to have written a query publication provided by a user. To accelerate record-based queries, we develop a novel algorithm that has a fast record-to-cluster match. We show that our algorithm can accelerate the query by a factor of 4.01 compared to a baseline naive approach.
引用
收藏
页码:265 / 273
页数:9
相关论文
共 28 条
[1]   The dual frontier: Patented inventions and prior scientific advance [J].
Ahmadpoor, Mohammad ;
Jones, Benjamin F. .
SCIENCE, 2017, 357 (6351) :583-587
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]  
Ester M., 1996, P 2 INT C KNOWL DISC, V96, P226
[4]   A Brief Survey of Automatic Methods for Author Name Disambiguation [J].
Ferreira, Anderson A. ;
Goncalves, Marcos Andre ;
Laender, Alberto H. F. .
SIGMOD RECORD, 2012, 41 (02) :15-26
[5]  
Jiang H, 2016, INT C PAR DISTRIB SY, P785, DOI [10.1109/ICPADS.2016.105, 10.1109/ICPADS.2016.0107]
[6]   Age dynamics in scientific creativity [J].
Jones, Benjamin F. ;
Weinberg, Bruce A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (47) :18910-18914
[7]  
Khabsa Madian, 2014, 2014 IEEE International Conference on Big Data (Big Data), P41, DOI 10.1109/BigData.2014.7004487
[8]   Online Person Name Disambiguation with Constraints [J].
Khabsa, Madian ;
Treeratpituk, Pucktada ;
Giles, C. Lee .
PROCEEDINGS OF THE 15TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL'15), 2015, :37-46
[9]  
Kim K., 2016, IJCAI 16 WORKSH SCHO
[10]  
Kim K., 2016, Proceedings of the Second International Workshop on Data Science for Macro-Modeling, P13