Multiple Features Driven Author Name Disambiguation

被引:7
作者
Zhou, Qian [1 ]
Chen, Wei [1 ]
Wang, Weiqing [2 ]
Xu, Jiajie [1 ]
Zhao, Lei [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Inst Artificial Intelligence, Suzhou, Peoples R China
[2] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
来源
2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021 | 2021年
基金
中国国家自然科学基金;
关键词
author name disambiguation; multiple features; binary classification; pruning strategy;
D O I
10.1109/ICWS53863.2021.00071
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Author Name Disambiguation (AND) has received more attention recently, accompanied by the increase of academic publications. To tackle the AND problem, existing studies have proposed many approaches based on different types of information, such as raw document feature (e.g., co-author, title, and keywords), fusion feature (e.g., a hybrid publication embedding based on raw document feature), local structural information (e.g., a publication's neighborhood information on a graph), and global structural information (e.g., the interactive information between a node and others on a graph). However, there has been no work taking all the above-mentioned information into account for the AND problem so far. To fill the gap, we propose a novel framework namely MFAND (Multiple Features Driven Author Name Disambiguation). Specifically, we first employ the raw document and fusion feature to construct six similarity graphs for each author name to be disambiguated. Next, the global and local structural information extracted from these graphs is fed into a novel encoder called 123.IG, which integrates and reconstructs the above-mentioned four types of information associated with an author, with the goal of learning the latent information to enhance the generalization ability of the MFAND. Then, the integrated and reconstructed information is fed into a binary classification model for disambiguation. Note that, several pruning strategies are applied before the information extraction to remove noise effectively. Finally, our proposed framework is investigated on two real-world datasets, and the experimental results show that MFAND performs better than all state-of-the-art methods.
引用
收藏
页码:506 / 515
页数:10
相关论文
共 29 条
[1]  
Chen B., 2020, TKDE, V1O
[2]   Multilingual author matching across different academic databases: a case study on KAKEN, DBLP, and PubMed [J].
Chikazawa, Yuto ;
Katsurai, Marie ;
Ohmukai, Ikki .
SCIENTOMETRICS, 2021, 126 (03) :2311-2327
[3]   An Unsupervised Heuristic-Based Hierarchical Method for Name Disambiguation in Bibliographic Citations [J].
Cota, Ricardo G. ;
Ferreira, Anderson A. ;
Nascimento, Cristiano ;
Goncalves, Marcos Andre ;
Laender, Alberto H. F. .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (09) :1853-1870
[4]  
Fan X., 2011, ACM J DATA INT QUAL
[5]  
Ferreira A.A., 2020, AUTOMATIC DISAMBIGUA
[6]   An Approach for Focused Crawler to Harvest Digital Academic Documents in Online Digital Libraries [J].
Gupta, Sumita ;
Duhan, Neelam ;
Bansal, Poonam .
INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2019, 9 (03) :23-47
[7]   Name disambiguation spectral in author citations using a K-way clustering method [J].
Han, H ;
Zha, HY ;
Giles, CL .
PROCEEDINGS OF THE 5TH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, PROCEEDINGS, 2005, :334-343
[8]   Two supervised learning approaches for name disambiguation in author citations [J].
Han, H ;
Giles, L ;
Zha, H ;
Li, C ;
Tsioutsiouliklis, K .
JCDL 2004: PROCEEDINGS OF THE FOURTH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: GLOBAL REACH AND DIVERSE IMPACT, 2004, :296-305
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]  
Jhawar K., 2020, P 20 ACMIEEE JOINT C, P469