A Similarity-Based Method for Entity Coreference Resolution in Big Data Environment

被引:0
作者
Geng, Yushui [1 ]
Li, Peng [1 ]
Zhao, Jing [1 ]
机构
[1] Qilu Univ Technol, Sch Informat, Jinan 250353, Peoples R China
来源
PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING (AMITP 2016) | 2016年 / 60卷
关键词
big data; entity coreference resolution; similarity; MapReduce;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Processing and analyzing large scale data is needed in the big data environment, however, a large number of duplicate data refer to the same entity in the data set have brought great difficulties to analyze and process the acquired data. The method based on cluster analysis is one of the main methods of entity coreference resolution, but it is time-consuming and does not apply to big data environment. This paper presents a similarity-based method for entity coreference resolution by introducing weight and similarity and using Hadoop platform and MapReduce framework, which will process data into the form of key-value data pairs and can be efficiently applied to the entity coreference resolution. Experiments show that the proposed method greatly improves the speed and accuracy of entity coreference resolution, meets the demand for entity coreference resolution in big data environment.
引用
收藏
页码:110 / 116
页数:7
相关论文
共 9 条
[1]  
Arasu Arvind., 2009, PVLDB, V2, P514
[2]   Adaptive Graphical Approach to Entity Resolution [J].
Chen, Zhaoqi ;
Kalashnikov, Dmitri V. ;
Mehrotra, Sharad .
PROCEEDINGS OF THE 7TH ACM/IEE JOINT CONFERENCE ON DIGITAL LIBRARIES: BUILDING & SUSTAINING THE DIGITAL ENVIRONMENT, 2007, :204-213
[3]  
Cvitas A., 2011, MIPRO 2011, P23
[4]  
Elmagarmid A. K., 2007, IEEE T KNOWLEDGE DAT, V19
[5]  
Ghoting A, 2011, PROC INT CONF DATA, P231, DOI 10.1109/ICDE.2011.5767930
[6]   Creating probabilistic databases from duplicated data [J].
Hassanzadeh, Oktie ;
Miller, Renee J. .
VLDB JOURNAL, 2009, 18 (05) :1141-1166
[7]  
Köpcke H, 2010, PROC VLDB ENDOW, V3, P484
[8]  
Singla P, 2006, IEEE DATA MINING, P572
[9]  
Vernica R., 2010, P 2010 ACM SIGMOD IN, P495, DOI DOI 10.1145/1807167.1807222