A Unified Probabilistic Framework for Name Disambiguation in Digital Library

被引:177
作者
Tang, Jie [1 ]
Fong, A. C. M. [2 ]
Wang, Bo [3 ]
Zhang, Jing [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Auckland Univ Technol, Sch Comp & Math Sci, Auckland 1142, New Zealand
[3] Nanjing Univ Aeronaut & Astronaut, Dept Comp Sci, Nanjing 210016, Peoples R China
关键词
Digital libraries; information search and retrieval; database applications; heterogeneous databases; MODEL;
D O I
10.1109/TKDE.2011.13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite years of research, the name ambiguity problem remains largely unresolved. Outstanding issues include how to capture all information for name disambiguation in a unified approach, and how to determine the number of people K in the disambiguation process. In this paper, we formalize the problem in a unified probabilistic framework, which incorporates both attributes and relationships. Specifically, we define a disambiguation objective function for the problem and propose a two-step parameter estimation algorithm. We also investigate a dynamic approach for estimating the number of people K. Experiments show that our proposed framework significantly outperforms four baseline methods of using clustering algorithms and two other previous methods. Experiments also indicate that the number K automatically found by our method is close to the actual number.
引用
收藏
页码:975 / 987
页数:13
相关论文
共 52 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 2007, ACM Transactions on Knowledge Discovery from Data (TKDD), DOI [DOI 10.1145/1217299.1217304, 10.1145/1217299.1217304]
[3]  
[Anonymous], MARKOV FIELDS UNPUB
[4]  
[Anonymous], 2005, WWW '05
[5]  
[Anonymous], 2004, P 10 ACM SIGKDD INT, DOI DOI 10.1145/1014052.1014062
[6]  
[Anonymous], 2008, VLDB J
[7]  
Buckley C., 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P25, DOI 10.1145/1008992.1009000
[8]  
Cai D., 2004, 2856 UIUC
[9]  
Chen ZQ, 2009, ACM SIGMOD/PODS 2009 CONFERENCE, P207
[10]   Adaptive Graphical Approach to Entity Resolution [J].
Chen, Zhaoqi ;
Kalashnikov, Dmitri V. ;
Mehrotra, Sharad .
PROCEEDINGS OF THE 7TH ACM/IEE JOINT CONFERENCE ON DIGITAL LIBRARIES: BUILDING & SUSTAINING THE DIGITAL ENVIRONMENT, 2007, :204-213