An Efficient Batch Similarity Processing with MapReduce

被引：0

作者：

Trong Nhan Phan ^{[1
]}

Tran Khanh Dang ^{[1
]}

机构：

[1] HCMC Univ Technol, VNU HCM, Fac Comp Sci & Engn, Ho Chi Minh City, Vietnam

来源：

FUTURE DATA AND SECURITY ENGINEERING, FDSE 2018 | 2018年 / 11251卷

关键词：

Similarity search; Batch processing; Lightweight indexing; MapReduce;

D O I：

10.1007/978-3-030-03192-3_12

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we study an efficient way for batch similarity processing with MapReduce. With the inverted index as a backbone, we embed metadata inside the indexes to minimize redundant data so as to build lightweight indexes from the data sources. In addition, we propose a general query batch processing scheme that not only handles a single query but also deals with sets of query in an incremental manner. Moreover, we build the indexes in an ordered fashion so that we can perform quick pruning discarding unnecessary objects and supporting the performance of similarity search. Last but not least, we measure our proposed solution by conducting empirical experiments on real datasets. The results verify the efficiency of our method when we do similarity search with query batches, especially when both query sets and data sets are large.

引用

页码：158 / 171

页数：14

共 50 条

[21] Efficient Processing of Area Skyline Query in MapReduce Framework
Choudhury, Zakia Zinat
Zaman, Asif
Hamid, Md Ekramul
2018 4TH IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (IEEE WIECON-ECE 2018), 2018, : 79 - 82
[22] Efficient Processing Distributed Joins with Bloomfilter using MapReduce
Zhang, Changchun
Wu, Lei
Li, Jing
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2013, 6 (03): : 43 - 57
[23] Parallelized Similarity Flooding Algorithm for Processing Large Scale Graph Datasets with MapReduce
Zhang, Jian
Yuan, Chunfeng
Huang, Yihua
2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 184 - 188
[24] Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce
Xu, Jia
Lei, Bin
Gu, Yu
Winslett, Marianne
Yu, Ge
Zhang, Zhenjie
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (08) : 2148 - 2162
[25] An efficient MapReduce scheduling scheme for processing large multimedia data
Bok, Kyoungsoo
Hwang, Jaemin
Lim, Jongtae
Kim, Yeonwoo
Yoo, Jaesoo
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (16) : 17273 - 17296
[26] An efficient MapReduce scheduling scheme for processing large multimedia data
Kyoungsoo Bok
Jaemin Hwang
Jongtae Lim
Yeonwoo Kim
Jaesoo Yoo
Multimedia Tools and Applications, 2017, 76 : 17273 - 17296
[27] FP-Hadoop: Efficient processing of skewed MapReduce jobs
Liroz-Gistau, Miguel
Akbarinia, Reza
Agrawal, Divyakant
Valduriez, Patrick
INFORMATION SYSTEMS, 2016, 60 : 69 - 84
[28] Efficient Batch Parallel Online Sequential Extreme Learning Machine Algorithm Based on MapReduce
Huang, Shan
Wang, Botao
Chen, Yuemei
Wang, Guoren
Yu, Ge
PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 13 - 25
[29] Efficient and Scalable Processing of String Similarity Join
Rong, Chuitian
Lu, Wei
Wang, Xiaoli
Du, Xiaoyong
Chen, Yueguo
Tung, Anthony K. H.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (10) : 2217 - 2230
[30] Efficient Top-k Similarity Join of Massive Time Series Using MapReduce
Chen, Dehua
Shen, Changgan
Li, Yue
Le, Jiajin
Rong, Chunming
JOURNAL OF INTERNET TECHNOLOGY, 2014, 15 (06): : 1025 - 1032

← 1 2 3 4 5 →