Entity resolution for distributed probabilistic data

被引:4
作者
Ayat, Naser [1 ,2 ]
Akbarinia, Reza [3 ]
Afsarmanesh, Hamideh [1 ]
Valduriez, Patrick [3 ]
机构
[1] Univ Amsterdam, Inst Informat, Amsterdam, Netherlands
[2] Payame Noor Univ, Tehran, Iran
[3] INRIA, ZENITH Team, LIRMM, Montpellier, France
关键词
Entity resolution; Probabilistic data; Distributed data;
D O I
10.1007/s10619-013-7129-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of entity resolution over probabilistic data (ERPD) arises in many applications that have to deal with probabilistic data. In many of these applications, probabilistic data is distributed among a number of nodes. The simple, centralized approach to the ERPD problem does not scale well as large amounts of data need to be sent to a central node. In this paper, we present FD (Fully Distributed), a decentralized algorithm for dealing with the ERPD problem over distributed data, with the goal of minimizing bandwidth usage and reducing processing time. FD is completely distributed and does not depend on the existence of certain nodes. We validated FD through implementation over a 75-node cluster and simulation using the PeerSim simulator. We used both synthetic and real-world data in our experiments. Our performance evaluation shows that FD can achieve major performance gains in terms of bandwidth usage and response time.
引用
收藏
页码:509 / 542
页数:34
相关论文
共 33 条
[1]  
Abiteboul S., 2009, VLDB, V18
[2]   On the expressiveness of probabilistic XML models [J].
Abiteboul, Serge ;
Kimelfeld, Benny ;
Sagiv, Yehoshua ;
Senellart, Pierre .
VLDB JOURNAL, 2009, 18 (05) :1041-1064
[3]  
[Anonymous], 2005, P CVPR
[4]  
Antova L., 2009, VLDB, V18(
[5]  
Antova L, 2009, VLDB J, V18, P1021, DOI 10.1007/s00778-009-0149-y
[6]  
Ayat N., 2012, BDA
[7]  
Benjelloun O., 2006, P VLDB ENDOW
[8]  
Cheng R., 2008, P VLDB ENDOW, V1
[9]  
Cheng R., 2008, P VLDB ENDOW, V1
[10]  
Dalvi N. N., 2004, P VLDB ENDOW