Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous Academic Networks

被引：21

作者：

Sun, Qingyun ^{[1
,2
]}

Peng, Hao ^{[1
]}

Li, Jianxin ^{[1
]}

Wang, Senzhang ^{[3
]}

Dong, Xiangyu ^{[1
]}

Zhao, Liangxuan ^{[1
]}

Yu, Philip S. ^{[4
]}

He, Lifang ^{[5
]}

机构：

[1] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China

[2] Beihang Univ, Shenyuan Honors Coll, Beijing 100191, Peoples R China

[3] Nanjing Univ Aeronaut & Astronaut, Nanjing 211106, Peoples R China

[4] Univ Illinois, Chicago, IL 60607 USA

[5] Lehigh Univ, Bethlehem, PA 18015 USA

来源：

20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020) | 2020年

基金：

国家重点研发计划;

关键词：

Name disambiguation; graph embedding; pairwise learning; heterogeneous information network;

D O I：

10.1109/ICDM50108.2020.00060

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Name disambiguation aims to identify unique authors with the same name. Existing name disambiguation methods always exploit author attributes to enhance disambiguation results. However, some discriminative author attributes (e.g., email and affiliation) may change because of graduation or job-hopping, which will result in the separation of the same author's papers in digital libraries. Although these attributes may change, an author's co-authors and research topics do not change frequently with time, which means that papers within a period have similar text and relation information in the academic network. Inspired by this idea, we introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem. We divided papers into small blocks based on discriminative author attributes and blocks of the same author will be merged according to pairwise classification results of MA-PairRNN. MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework. In addition to attribute and structure information, MA-PairRNN also exploits semantic information by meta-path and generates node representation in an inductive way, which is scalable to large graphs. Furthermore, a semantic-level attention mechanism is adopted to fuse multiple meta-path based representations. A Pseudo-Siamese network consisting of two RNNs takes two paper sequences in publication time order as input and outputs their similarity. Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task. It was also demonstrated that MA-PairRNN can perform well with a small amount of training data and have better generalization ability across different research areas.

引用

页码：511 / 520

页数：10

共 50 条

[1] Exploiting citation networks for large-scale author name disambiguation
Christian Schulz
Amin Mazloumian
Alexander M Petersen
Orion Penner
Dirk Helbing
EPJ Data Science, 3
[2] Exploiting citation networks for large-scale author name disambiguation
Schulz, Christian
Mazloumian, Amin
Petersen, Alexander M.
Penner, Orion
Helbing, Dirk
EPJ DATA SCIENCE, 2014, 3 (01) : 1 - 14
[3] Efficient name disambiguation for large-scale databases
Huang, Jian
Ertekin, Seyda
Giles, C. Lee
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS, 2006, 4213 : 536 - 544
[4] An Effective Author Name Disambiguation Framework for Large-Scale Publications
Zhou, Anji
Shi, Minghui
Yuan, Rui
IEEE ACCESS, 2024, 12 : 182086 - 182100
[5] Aggregating large-scale databases for PubMed author name disambiguation
Zhang, Li
Huang, Yong
Yang, Jinqing
Lu, Wei
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (09) : 1919 - 1927
[6] Distortive Effects of Initial-Based Name Disambiguation on Measurements of Large-Scale Coauthorship Networks
Kim, Jinseok
Diesner, Jana
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (06) : 1446 - 1461
[7] Name Disambiguation Scheme Based on Heterogeneous Academic Sites
Choi, Dojin
Jang, Junhyeok
Song, Sangho
Lee, Hyeonbyeong
Lim, Jongtae
Bok, Kyoungsoo
Yoo, Jaesoo
APPLIED SCIENCES-BASEL, 2024, 14 (01):
[8] Large-scale name disambiguation of Chinese patent inventors (1985–2016)
Deyun Yin
Kazuyuki Motohashi
Jianwei Dang
Scientometrics, 2020, 122 : 765 - 790
[9] Large-scale name disambiguation of Chinese patent inventors (1985-2016)
Yin, Deyun
Motohashi, Kazuyuki
Dang, Jianwei
SCIENTOMETRICS, 2020, 122 (02) : 765 - 790
[10] DDHH: A Decentralized Deep Learning Framework for Large-scale Heterogeneous Networks
Imran, Mubashir
Yin, Hongzhi
Chen, Tong
Huang, Zi
Zhang, Xiangliang
Zheng, Kai
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2033 - 2038

← 1 2 3 4 5 →