An Efficient Batch Similarity Processing with MapReduce

被引:0
|
作者
Trong Nhan Phan [1 ]
Tran Khanh Dang [1 ]
机构
[1] HCMC Univ Technol, VNU HCM, Fac Comp Sci & Engn, Ho Chi Minh City, Vietnam
来源
FUTURE DATA AND SECURITY ENGINEERING, FDSE 2018 | 2018年 / 11251卷
关键词
Similarity search; Batch processing; Lightweight indexing; MapReduce;
D O I
10.1007/978-3-030-03192-3_12
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we study an efficient way for batch similarity processing with MapReduce. With the inverted index as a backbone, we embed metadata inside the indexes to minimize redundant data so as to build lightweight indexes from the data sources. In addition, we propose a general query batch processing scheme that not only handles a single query but also deals with sets of query in an incremental manner. Moreover, we build the indexes in an ordered fashion so that we can perform quick pruning discarding unnecessary objects and supporting the performance of similarity search. Last but not least, we measure our proposed solution by conducting empirical experiments on real datasets. The results verify the efficiency of our method when we do similarity search with query batches, especially when both query sets and data sets are large.
引用
收藏
页码:158 / 171
页数:14
相关论文
共 50 条
  • [21] Efficient Processing of Area Skyline Query in MapReduce Framework
    Choudhury, Zakia Zinat
    Zaman, Asif
    Hamid, Md Ekramul
    2018 4TH IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (IEEE WIECON-ECE 2018), 2018, : 79 - 82
  • [22] Efficient Processing Distributed Joins with Bloomfilter using MapReduce
    Zhang, Changchun
    Wu, Lei
    Li, Jing
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2013, 6 (03): : 43 - 57
  • [23] Parallelized Similarity Flooding Algorithm for Processing Large Scale Graph Datasets with MapReduce
    Zhang, Jian
    Yuan, Chunfeng
    Huang, Yihua
    2012 13TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS, AND TECHNOLOGIES (PDCAT 2012), 2012, : 184 - 188
  • [24] Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce
    Xu, Jia
    Lei, Bin
    Gu, Yu
    Winslett, Marianne
    Yu, Ge
    Zhang, Zhenjie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (08) : 2148 - 2162
  • [25] An efficient MapReduce scheduling scheme for processing large multimedia data
    Bok, Kyoungsoo
    Hwang, Jaemin
    Lim, Jongtae
    Kim, Yeonwoo
    Yoo, Jaesoo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (16) : 17273 - 17296
  • [26] An efficient MapReduce scheduling scheme for processing large multimedia data
    Kyoungsoo Bok
    Jaemin Hwang
    Jongtae Lim
    Yeonwoo Kim
    Jaesoo Yoo
    Multimedia Tools and Applications, 2017, 76 : 17273 - 17296
  • [27] FP-Hadoop: Efficient processing of skewed MapReduce jobs
    Liroz-Gistau, Miguel
    Akbarinia, Reza
    Agrawal, Divyakant
    Valduriez, Patrick
    INFORMATION SYSTEMS, 2016, 60 : 69 - 84
  • [28] Efficient Batch Parallel Online Sequential Extreme Learning Machine Algorithm Based on MapReduce
    Huang, Shan
    Wang, Botao
    Chen, Yuemei
    Wang, Guoren
    Yu, Ge
    PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 13 - 25
  • [29] Efficient and Scalable Processing of String Similarity Join
    Rong, Chuitian
    Lu, Wei
    Wang, Xiaoli
    Du, Xiaoyong
    Chen, Yueguo
    Tung, Anthony K. H.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (10) : 2217 - 2230
  • [30] Efficient Top-k Similarity Join of Massive Time Series Using MapReduce
    Chen, Dehua
    Shen, Changgan
    Li, Yue
    Le, Jiajin
    Rong, Chunming
    JOURNAL OF INTERNET TECHNOLOGY, 2014, 15 (06): : 1025 - 1032