Name disambiguation spectral in author citations using a K-way clustering method

被引:127
作者
Han, H [1 ]
Zha, HY [1 ]
Giles, CL [1 ]
机构
[1] Yahoo Inc, Sunnyvale, CA 95129 USA
来源
PROCEEDINGS OF THE 5TH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, PROCEEDINGS | 2005年
关键词
name disambiguation; feature selection; unsupervised learning; spectral clustering;
D O I
10.1145/1065385.1065462
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies (citations)(1). This can produce name ambiguity which can affect the performance of document retrieval, web search, and database integration, and may cause improper attribution of credit. Proposed here is an unsupervised learning approach using K-way spectral clustering that disambiguates authors in citations. The approach utilizes three types of citation attributes: co-author names, paper titles, and publication venue titles(2). The approach is illustrated with 16 name datasets with citations collected from the DBLP database bibliography and author home pages and shows that name disambiguation can be achieved using these citation attributes.
引用
收藏
页码:334 / 343
页数:10
相关论文
共 50 条
  • [1] [Anonymous], P 31 ANN M ASS COMP
  • [2] Azar Y., 2001, P 33 ANN ACM S THEOR, P619, DOI [10.1145/380752.380859, DOI 10.1145/380752.380859]
  • [3] BAEZAYATES RA, 1999, MODERN INFORMATION R
  • [4] Baker L. D., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P96, DOI 10.1145/290941.290970
  • [5] Banerjee S., 2002, P 3 INT C INT TEXT P
  • [6] Bar-Shalom Y., 1988, Tracking and Data Association
  • [7] Adaptive name matching in information integration
    Bilenko, M
    Mooney, R
    Cohen, W
    Ravikumar, P
    Fienberg, S
    [J]. IEEE INTELLIGENT SYSTEMS, 2003, 18 (05) : 16 - 23
  • [8] DUPLICATE RECORD ELIMINATION IN LARGE DATA FILES
    BITTON, D
    DEWITT, DJ
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 1983, 8 (02): : 255 - 265
  • [9] BRANTING LK, 2002, J INFORMATION LAW TE, P1
  • [10] Califf ME, 1999, SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), P328