An Efficient Batch Similarity Processing with MapReduce

被引:0
作者
Trong Nhan Phan [1 ]
Tran Khanh Dang [1 ]
机构
[1] HCMC Univ Technol, VNU HCM, Fac Comp Sci & Engn, Ho Chi Minh City, Vietnam
来源
FUTURE DATA AND SECURITY ENGINEERING, FDSE 2018 | 2018年 / 11251卷
关键词
Similarity search; Batch processing; Lightweight indexing; MapReduce;
D O I
10.1007/978-3-030-03192-3_12
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we study an efficient way for batch similarity processing with MapReduce. With the inverted index as a backbone, we embed metadata inside the indexes to minimize redundant data so as to build lightweight indexes from the data sources. In addition, we propose a general query batch processing scheme that not only handles a single query but also deals with sets of query in an incremental manner. Moreover, we build the indexes in an ordered fashion so that we can perform quick pruning discarding unnecessary objects and supporting the performance of similarity search. Last but not least, we measure our proposed solution by conducting empirical experiments on real datasets. The results verify the efficiency of our method when we do similarity search with query batches, especially when both query sets and data sets are large.
引用
收藏
页码:158 / 171
页数:14
相关论文
共 50 条
  • [31] Efficient Snapshot KNN Join Processing for Large Data Using MapReduce
    Hu, Yupeng
    Yang, Chong
    Ji, Cun
    Xu, Yang
    Li, Xueqing
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 713 - 720
  • [32] Efficient Similarity Join for Massive Time Sequences Using Locality Sensitive Hash and Mapreduce
    Chen, Dehua
    Zheng, Liangliang
    Zhou, Meng
    Yu, Shoujian
    2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA), 2013, : 529 - 533
  • [33] Efficient Multi-dimensional Spatial RkNN Query Processing with MapReduce
    Ji, Changqing
    Hu, Hongbin
    Xu, Yujie
    Li, Yuanyuan
    Qu, Wenyu
    2013 8TH CHINAGRID ANNUAL CONFERENCE (CHINAGRID), 2013, : 63 - 68
  • [34] Metric Similarity Joins Using MapReduce
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 656 - 669
  • [35] RStream: Simple and Efficient Batch and Stream Processing at Scale
    Fino, Alessio
    Margara, Alessandro
    Cugola, Gianpaolo
    Donadoni, Marco
    Morassutto, Edoardo
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2764 - 2774
  • [36] RHJoin: A Fast and Space-efficient Join Method for Log Processing in MapReduce
    Tang, Dixin
    Liu, Taoying
    Liu, Hong
    Li, Wei
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 975 - 980
  • [37] Architecture of Efficient Word Processing using Hadoop MapReduce for Big Data Applications
    Mandal, Bichitra
    Sahoo, Ramesh Kumar
    Sethi, Srinivas
    PROCEEDINGS 2015 INTERNATIONAL CONFERENCE ON MAN AND MACHINE INTERFACING (MAMI), 2015,
  • [38] Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata
    Selvan, S. Tamil
    Balamurugan, P.
    Vijayakumar, M.
    DISTRIBUTED AND PARALLEL DATABASES, 2021, 39 (04) : 855 - 872
  • [39] Prefetched wald adaptive boost classification based Czekanowski similarity MapReduce for user query processing with bigdata
    S. Tamil Selvan
    P. Balamurugan
    M. Vijayakumar
    Distributed and Parallel Databases, 2021, 39 : 855 - 872
  • [40] Fast and scalable vector similarity joins with MapReduce
    Byoungju Yang
    Hyun Joon Kim
    Junho Shim
    Dongjoo Lee
    Sang-goo Lee
    Journal of Intelligent Information Systems, 2016, 46 : 473 - 497