An Efficient Batch Similarity Processing with MapReduce

被引:0
作者
Trong Nhan Phan [1 ]
Tran Khanh Dang [1 ]
机构
[1] HCMC Univ Technol, VNU HCM, Fac Comp Sci & Engn, Ho Chi Minh City, Vietnam
来源
FUTURE DATA AND SECURITY ENGINEERING, FDSE 2018 | 2018年 / 11251卷
关键词
Similarity search; Batch processing; Lightweight indexing; MapReduce;
D O I
10.1007/978-3-030-03192-3_12
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we study an efficient way for batch similarity processing with MapReduce. With the inverted index as a backbone, we embed metadata inside the indexes to minimize redundant data so as to build lightweight indexes from the data sources. In addition, we propose a general query batch processing scheme that not only handles a single query but also deals with sets of query in an incremental manner. Moreover, we build the indexes in an ordered fashion so that we can perform quick pruning discarding unnecessary objects and supporting the performance of similarity search. Last but not least, we measure our proposed solution by conducting empirical experiments on real datasets. The results verify the efficiency of our method when we do similarity search with query batches, especially when both query sets and data sets are large.
引用
收藏
页码:158 / 171
页数:14
相关论文
共 50 条
  • [41] Fast and scalable vector similarity joins with MapReduce
    Yang, Byoungju
    Kim, Hyun Joon
    Shim, Junho
    Lee, Dongjoo
    Lee, Sang-goo
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (03) : 473 - 497
  • [42] Privacy preserving similarity joins using MapReduce
    Ding, Xiaofeng
    Yang, Wanlu
    Choo, Kim-Kwang Raymond
    Wang, Xiaoli
    Jin, Hai
    INFORMATION SCIENCES, 2019, 493 : 20 - 33
  • [43] Practising Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Ge, Bin
    Xiao, Chuan
    Chi, Chi-Hung
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 112 - 119
  • [44] Fast and scalable vector similarity joins with MapReduce
    Byoungju Yang
    Hyun Joon Kim
    Junho Shim
    Dongjoo Lee
    Sang-goo Lee
    Journal of Intelligent Information Systems, 2016, 46 : 473 - 497
  • [45] Spatial Data Processing with MapReduce
    Gunawardena, Tilani
    Vicari, Annamaria
    Mecca, Giansalvatore
    2015 IEEE 10TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2015, : 485 - 490
  • [46] Simplifying MapReduce data processing
    Liao, Chih-Shan
    Shih, Jin-Ming
    Chang, Ruay-Shiung
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2013, 8 (03) : 219 - 226
  • [47] An efficient mechanism for processing similarity search queries in sensor networks
    Chung, Yu-Chi
    Su, I-Fang
    Lee, Chiang
    INFORMATION SCIENCES, 2011, 181 (02) : 284 - 307
  • [48] Classification of Knowledge Processing by MapReduce
    Benhamed, Siham
    Nait-Bahloul, Safia
    2014 4TH INTERNATIONAL SYMPOSIUM ISKO-MAGHREB: CONCEPTS AND TOOLS FOR KNOWLEDGE MANAGEMENT (ISKO-MAGHREB), 2014,
  • [49] Efficient Storage and Processing of Video Data for Moving Object Detection Using Hadoop/MapReduce
    Parsola, Jyoti
    Gangodkar, Durgaprasad
    Mittal, Ankush
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 137 - 147
  • [50] Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework
    Moutafis, Panagiotis
    Mavrommatis, George
    Vassilakopoulos, Michael
    Sioutas, Spyros
    DATA & KNOWLEDGE ENGINEERING, 2019, 121 : 42 - 70