Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous Academic Networks

被引:21
|
作者
Sun, Qingyun [1 ,2 ]
Peng, Hao [1 ]
Li, Jianxin [1 ]
Wang, Senzhang [3 ]
Dong, Xiangyu [1 ]
Zhao, Liangxuan [1 ]
Yu, Philip S. [4 ]
He, Lifang [5 ]
机构
[1] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China
[2] Beihang Univ, Shenyuan Honors Coll, Beijing 100191, Peoples R China
[3] Nanjing Univ Aeronaut & Astronaut, Nanjing 211106, Peoples R China
[4] Univ Illinois, Chicago, IL 60607 USA
[5] Lehigh Univ, Bethlehem, PA 18015 USA
来源
20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020) | 2020年
基金
国家重点研发计划;
关键词
Name disambiguation; graph embedding; pairwise learning; heterogeneous information network;
D O I
10.1109/ICDM50108.2020.00060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Name disambiguation aims to identify unique authors with the same name. Existing name disambiguation methods always exploit author attributes to enhance disambiguation results. However, some discriminative author attributes (e.g., email and affiliation) may change because of graduation or job-hopping, which will result in the separation of the same author's papers in digital libraries. Although these attributes may change, an author's co-authors and research topics do not change frequently with time, which means that papers within a period have similar text and relation information in the academic network. Inspired by this idea, we introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem. We divided papers into small blocks based on discriminative author attributes and blocks of the same author will be merged according to pairwise classification results of MA-PairRNN. MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework. In addition to attribute and structure information, MA-PairRNN also exploits semantic information by meta-path and generates node representation in an inductive way, which is scalable to large graphs. Furthermore, a semantic-level attention mechanism is adopted to fuse multiple meta-path based representations. A Pseudo-Siamese network consisting of two RNNs takes two paper sequences in publication time order as input and outputs their similarity. Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task. It was also demonstrated that MA-PairRNN can perform well with a small amount of training data and have better generalization ability across different research areas.
引用
收藏
页码:511 / 520
页数:10
相关论文
共 50 条
  • [1] Exploiting citation networks for large-scale author name disambiguation
    Christian Schulz
    Amin Mazloumian
    Alexander M Petersen
    Orion Penner
    Dirk Helbing
    EPJ Data Science, 3
  • [2] Exploiting citation networks for large-scale author name disambiguation
    Schulz, Christian
    Mazloumian, Amin
    Petersen, Alexander M.
    Penner, Orion
    Helbing, Dirk
    EPJ DATA SCIENCE, 2014, 3 (01) : 1 - 14
  • [3] Efficient name disambiguation for large-scale databases
    Huang, Jian
    Ertekin, Seyda
    Giles, C. Lee
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS, 2006, 4213 : 536 - 544
  • [4] An Effective Author Name Disambiguation Framework for Large-Scale Publications
    Zhou, Anji
    Shi, Minghui
    Yuan, Rui
    IEEE ACCESS, 2024, 12 : 182086 - 182100
  • [5] Aggregating large-scale databases for PubMed author name disambiguation
    Zhang, Li
    Huang, Yong
    Yang, Jinqing
    Lu, Wei
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (09) : 1919 - 1927
  • [6] Distortive Effects of Initial-Based Name Disambiguation on Measurements of Large-Scale Coauthorship Networks
    Kim, Jinseok
    Diesner, Jana
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (06) : 1446 - 1461
  • [7] Name Disambiguation Scheme Based on Heterogeneous Academic Sites
    Choi, Dojin
    Jang, Junhyeok
    Song, Sangho
    Lee, Hyeonbyeong
    Lim, Jongtae
    Bok, Kyoungsoo
    Yoo, Jaesoo
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [8] Large-scale name disambiguation of Chinese patent inventors (1985–2016)
    Deyun Yin
    Kazuyuki Motohashi
    Jianwei Dang
    Scientometrics, 2020, 122 : 765 - 790
  • [9] Large-scale name disambiguation of Chinese patent inventors (1985-2016)
    Yin, Deyun
    Motohashi, Kazuyuki
    Dang, Jianwei
    SCIENTOMETRICS, 2020, 122 (02) : 765 - 790
  • [10] DDHH: A Decentralized Deep Learning Framework for Large-scale Heterogeneous Networks
    Imran, Mubashir
    Yin, Hongzhi
    Chen, Tong
    Huang, Zi
    Zhang, Xiangliang
    Zheng, Kai
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2033 - 2038