Duplicate Literature Detection for Cross-Library Search

被引:2
作者
Liu, Wei [1 ]
Zeng, Jianxun [1 ]
机构
[1] Inst Sci & Tech Informat China, Beijing 10038, Peoples R China
关键词
Information integration; digital library; duplicate detection; schema mapping; data cleaning;
D O I
10.1515/cait-2016-0028
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The proliferation of online digital libraries offers users a great opportunity to search their desired literatures on Web. Cross-library search applications can help users search more literature information from multiple digital libraries. Duplicate literatures detection is always a necessary step when merging the search results from multiple digital libraries due to heterogeneity and autonomy of digital libraries. To this end, this paper proposes a holistic solution which includes achieving automatic training set, holistic attribute mapping, and weight of attribute training. The experiments on real digital libraries show that the proposed solution is highly effective.
引用
收藏
页码:160 / 178
页数:19
相关论文
共 28 条
  • [1] Adaptive name matching in information integration
    Bilenko, M
    Mooney, R
    Cohen, W
    Ravikumar, P
    Fienberg, S
    [J]. IEEE INTELLIGENT SYSTEMS, 2003, 18 (05) : 16 - 23
  • [2] Breiman L., 1984, CLASSIFICATION REGRE
  • [3] Chang KCC, 2004, SIGMOD REC, V33, P61, DOI 10.1145/1031570.1031584
  • [4] Chaudhuri S, 2005, PROC INT CONF DATA, P865
  • [6] Efficient data reconciliation
    Cochinwala, M
    Kurien, V
    Lalk, G
    Shasha, D
    [J]. INFORMATION SCIENCES, 2001, 137 (1-4) : 1 - 15
  • [7] Cohen William W, 2002, P 8 ACM SIGKDD INT C
  • [8] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [9] DRAGUT EC, 2012, SYNTHESIS LECT DATA, V7, P1
  • [10] A THEORY FOR RECORD LINKAGE
    FELLEGI, IP
    SUNTER, AB
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1969, 64 (328) : 1183 - &