An Efficient Batch Similarity Processing with MapReduce

被引:0
|
作者
Trong Nhan Phan [1 ]
Tran Khanh Dang [1 ]
机构
[1] HCMC Univ Technol, VNU HCM, Fac Comp Sci & Engn, Ho Chi Minh City, Vietnam
来源
FUTURE DATA AND SECURITY ENGINEERING, FDSE 2018 | 2018年 / 11251卷
关键词
Similarity search; Batch processing; Lightweight indexing; MapReduce;
D O I
10.1007/978-3-030-03192-3_12
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we study an efficient way for batch similarity processing with MapReduce. With the inverted index as a backbone, we embed metadata inside the indexes to minimize redundant data so as to build lightweight indexes from the data sources. In addition, we propose a general query batch processing scheme that not only handles a single query but also deals with sets of query in an incremental manner. Moreover, we build the indexes in an ordered fashion so that we can perform quick pruning discarding unnecessary objects and supporting the performance of similarity search. Last but not least, we measure our proposed solution by conducting empirical experiments on real datasets. The results verify the efficiency of our method when we do similarity search with query batches, especially when both query sets and data sets are large.
引用
收藏
页码:158 / 171
页数:14
相关论文
共 50 条
  • [1] A Lightweight Indexing Approach for Efficient Batch Similarity Processing with MapReduce
    Phan T.N.
    Dang T.K.
    SN Computer Science, 2020, 1 (1)
  • [2] Efficient Batch Processing of Proximity Queries with MapReduce
    Nam, GiWoong
    Kim, DongEun
    Lee, JongHyeok
    Youn, Hee Yong
    Kim, Ung-Mo
    ACM IMCOM 2015, Proceedings, 2015,
  • [3] Batch Text Similarity Search with MapReduce
    Li, Rui
    Ju, Li
    Peng, Zhuo
    Yu, Zhiwei
    Wang, Chaokun
    WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 412 - +
  • [4] eHSim: An Efficient Hybrid Similarity Search with MapReduce
    Trong Nhan Phan
    Kung, Josef
    Tran Khanh Dang
    IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS IEEE AINA 2016, 2016, : 422 - 429
  • [5] An Efficient Similarity Search in Large Data Collections with MapReduce
    Trong Nhan Phan
    Kueng, Josef
    Tran Khanh Dang
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2014, 2014, 8860 : 44 - 57
  • [6] MapReduce++ - Efficient processing of MapReduce jobs in the cloud
    Zhang, Guigang
    Li, Chao
    Zhang, Yong
    Xing, Chunxiao
    Yang, Jijiang
    Journal of Computational Information Systems, 2012, 8 (14): : 5757 - 5764
  • [7] An efficient MapReduce algorithm for similarity join in metric spaces
    Wen Liu
    Yanming Shen
    Peng Wang
    The Journal of Supercomputing, 2016, 72 : 1179 - 1200
  • [8] An efficient MapReduce algorithm for similarity join in metric spaces
    Liu, Wen
    Shen, Yanming
    Wang, Peng
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (03) : 1179 - 1200
  • [9] Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
    Chen, Rong
    Chen, Haibo
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 10 (01)