Hadoop Framework For Entity Resolution Within High Velocity Streams

被引:4
作者
Benny, S. Prabhakar [1 ]
Vasavi, S. [2 ]
Anupriya, P. [3 ]
机构
[1] Kakatiya Univ, Univ Coll Engn Women, JNTU Hyderabad, Warangal, Telangana, India
[2] VR Siddhartha Engn Coll, Dept Comp Sci & Engn, Vijayawada, India
[3] VR Siddhartha Engn Coll, Dept Comp Sci & Engn, Vijayawada, India
来源
INTERNATIONAL CONFERENCE ON COMPUTATIONAL MODELLING AND SECURITY (CMS 2016) | 2016年 / 85卷
关键词
Big data; Entity Resolution; Hadoop Framework; Hive; Stream Processing;
D O I
10.1016/j.procs.2016.05.218
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large amount of data is being generated from sensors, satellites, social media etc. This big data (velocity, variety, veracity, value and veracity) can be processed so as to make timely decisions by the decision makers. This paper presents results of the prop osed Hadoop framework that performs entity resolution in Map and reduce phase. Map Reduce phase matches two real world objects and generates rules. The similarity score of these rules are used for matching stream data during testing phase. Similarity is calculated using 13 different semantic measures such as token-based similarity, edit-based similarity, hybrid similarity, phonetic similarity as well as domain dependent Natural language processing measures. Semantic measures are implemented using Hive programming. The proposed system is tested using e-catalogues of Amazon and Google. (C) 201 The Authors. Published by Elsevier B.V.
引用
收藏
页码:550 / 557
页数:8
相关论文
共 17 条
[1]  
Alvaro P., 2009, TECHNICAL REPORT
[2]  
[Anonymous], 2014, IEEE T KNOWLEDGE DAT
[3]  
[Anonymous], 2011, Pei. data mining concepts and techniques, DOI 10.1016/C2009-0-61819-5
[4]  
[Anonymous], IEEE T KNOWL DATA EN
[5]  
AnuPriya P., 2015, ADV COMP C IACC 2015, P35, DOI [10.1109/IADCC.2015.7154663, DOI 10.1109/IADCC.2015.7154663]
[6]  
Apache Software Foundation, Apache Drill
[7]  
Chardonnens Thibaud., 2013, BIG DATA ANAL HIGH V BIG DATA ANAL HIGH V
[8]  
EMC, CISC VIS NETW IND GL
[9]  
Intel Corporation, 2013, EXTR TRANSF LOAD BIG
[10]  
Kolb L., 2013, DATA ANAL CLOUD