Hybrid Deep Pairwise Classification for Author Name Disambiguation

被引:22
作者
Kim, Kunho [1 ]
Rohatgi, Shaurya [1 ]
Giles, C. Lee [1 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
来源
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19) | 2019年
基金
美国国家科学基金会;
关键词
Author Name Disambiguation; Pairwise Classification; Gradient Boosted Trees; Representation Learning;
D O I
10.1145/3357384.3358153
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Author name disambiguation (AND) can be defined as the problem of clustering together unique authors from all author mentions that have been extracted from publication or related records in digital libraries or other sources. Pairwise classification is an essential part of AND, and is used to estimate the probability that any pair of author mentions belong to the same author. Previous studies trained classifiers with features manually extracted from each attribute of the data. Recently, others trained a model to learn a vector representation from text without considering any structure information. Both of these approaches have advantages. The former method takes advantage of the structure of data, while the latter takes into account the textual similarity across attributes. Here, we introduce a hybrid method which takes advantage of both approaches by extracting both structure-aware features and global features. In addition, we introduce a novel way to train a global model utilizing a large number of negative samples. Results on AMiner and PubMed data shows the relative improvement of the mean average precision (MAP) by more than 7.45% when compared to previous state-of-the-art methods.
引用
收藏
页码:2369 / 2372
页数:4
相关论文
共 17 条
[1]   A Deep Neural Network for Pairwise Classification: Enabling Feature Conjunctions and Ensuring Symmetry [J].
Atarashi, Kyohei ;
Oyama, Satoshi ;
Kurihara, Masahito ;
Furudo, Kazune .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 :83-95
[2]   Distributed Representations of Tuples for Entity Resolution [J].
Ebraheem, Muhammad ;
Thirumuruganathan, Saravanan ;
Joty, Shafiq ;
Ouzzani, Mourad ;
Tang, Nan .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (11) :1454-1467
[3]  
Fan X., 2011, ACM J DATA INF QUAL, V2, P10, DOI DOI 10.1145/1891879.1891883
[4]   Two supervised learning approaches for name disambiguation in author citations [J].
Han, H ;
Giles, L ;
Zha, H ;
Li, C ;
Tsioutsiouliklis, K .
JCDL 2004: PROCEEDINGS OF THE FOURTH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: GLOBAL REACH AND DIVERSE IMPACT, 2004, :296-305
[5]  
Huang J, 2006, LECT NOTES ARTIF INT, V4213, P536
[6]  
Huang PS, 2013, PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), P2333
[7]  
Tran HN, 2014, LECT NOTES COMPUT SC, V8397, P123, DOI 10.1007/978-3-319-05476-6_13
[8]   Online Person Name Disambiguation with Constraints [J].
Khabsa, Madian ;
Treeratpituk, Pucktada ;
Giles, C. Lee .
PROCEEDINGS OF THE 15TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL'15), 2015, :37-46
[9]   A Web Service for Author Name Disambiguation in Scholarly Databases [J].
Kim, Kunho ;
Sefid, Athar ;
Weinberg, Bruce A. ;
Giles, C. Lee .
2018 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (IEEE ICWS 2018), 2018, :265-273
[10]   Citation-based bootstrapping for large-scale author disambiguation [J].
Levin, Michael ;
Krawczyk, Stefan ;
Bethard, Steven ;
Jurafsky, Dan .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (05) :1030-1047