Entity resolution for media metadata based on structural clustering

被引:0
作者
Gu, Qi [1 ,2 ]
Cao, Jian [1 ]
Liu, Yancen [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Sch Elect Informat & Elect Engn, 800 Dongchuan Rd, Shanghai 200240, Peoples R China
[2] Nantong Univ, Sch Informat Sci & Technol, Dept Comp Sci, 9 Seyuan Rd, Nantong 226019, Peoples R China
关键词
Entity resolution; Structural clustering; Iterative propagation; Graph structure;
D O I
10.1007/s11042-019-08062-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An increasing amount of media metadata are published by different organizations on the Web which leads to a fragmented dataset landscape. Identifying media metadata from disparate datasets and integrating heterogeneous datasets have many applications but also pose significant challenges. To tackle this problem, entity resolution methods are commonly used as an essential prerequisite for integrating media information from different sources and effectively foster the re-use of existing data sources. While the amount of media metadata published on the Web grows steadily, how to scale it well to large media knowledge bases while maintaining a high matching quality is a critical challenge. This article investigates the relationships between media entities. To that end, the media database is formulated as a knowledge graph with entities as nodes and the associations between related entities as edges. Thus, media entities are grouped into communities by how they share neighbors. Then, a structural clustering-based model is proposed to detect communities and discover anchor vertices as well as isolated vertices. Specifically, an initial seed set of matched anchor vertex pairs is obtained. Furthermore, an iterative propagation approach for identifying the matched entities in the whole graph is developed, where community similarity is introduced into the measure function to control the total measurement of candidate pairs. Therefore, starting with the elements of the initial seed set, the entity resolution algorithm updates the matching information over the whole network along with the neighbor relationships iteratively. Extensive experiments are conducted on real datasets to evaluate how the seed set impacts the matching process and performance. The experiment results show this model can achieve an excellent balance between accuracy and efficiency and is a clear improvement compared to state-of-the-art methods.
引用
收藏
页码:219 / 242
页数:24
相关论文
共 32 条
  • [21] Mahdisoltani Farzaneh, 2014, 7 BIENN C INN DAT SY
  • [22] De-anonymizing Social Networks
    Narayanan, Arvind
    Shmatikov, Vitaly
    [J]. PROCEEDINGS OF THE 2009 30TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, 2009, : 173 - 187
  • [23] Ngomo Axel-Cyrille Ngonga, 2011, 22 INT JOINT C ART I, P2312, DOI [DOI 10.5591/978-1-57735-516-8/IJCAI11-385, 10.5591/978-1-57735-516-8/IJCAI11-385]
  • [24] Ontology matching: A literature review
    Otero-Cerdeira, Lorena
    Rodriguez-Martinez, Francisco J.
    Gomez-Rodriguez, Alma
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (02) : 949 - 971
  • [25] Papadakis G, 2016, PROC VLDB ENDOW, V9, P684
  • [26] RiMOM-IM: A Novel Iterative Framework for Instance Matching
    Shao, Chao
    Hu, Lin-Mei
    Li, Juan-Zi
    Wang, Zhi-Chun
    Chung, Tonglee
    Xia, Jun-Bo
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2016, 31 (01) : 185 - 197
  • [27] Shu K., 2017, Acm Sigkdd Explorations Newsletter, V18, DOI 10.1145/3068777.3068781
  • [28] PARIS: Probabilistic Alignment of Relations, Instances, and Schema
    Suchanek, Fabian M.
    Abiteboul, Serge
    Senellart, Pierre
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (03): : 157 - 168
  • [29] SCAN: A Structural Clustering Algorithm for Netwo
    Xu, Xiaowei
    Yuruk, Nurcan
    Feng, Zhidan
    Schweiger, Thomas A. J.
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 824 - +
  • [30] Yu MH, 2014, WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, P21