A SPARQL query processing system using map-phase-multi join for big data in clouds

被引:1
作者
Huang S.-W. [1 ]
Yu C.-H. [1 ]
Shieh C.-K. [1 ]
Tsai M.-F. [2 ]
机构
[1] Department of Electrical Engineering, Institute of Computer and Communication Engineering, National Cheng Kung University
[2] Department of Electronic Engineering, National United University
关键词
Index terms-big data; Linked data; MapReduce; NoSQL; SPARQL; TripleStore;
D O I
10.1504/IJIPT.2017.087555
中图分类号
学科分类号
摘要
Big data refers to large datasets which are huge, complex and hard to be stored and analysed by traditional data processing tools. Linked data is one of the approaches to deal with big data which are stored and processed in TripleStore. For huge dataset, TripleStore requires more scalable techniques. 'MapReduce' programming model is the most representative of cloud technology. There are several approaches using MapReduce to serve SPARQL query but still exhibit unacceptable performance in complex queries. In this paper, we propose a map-phase-multi-join algorithm for processing SPARQL queries. Using multi-join, job initialisation time is reduced by avoiding iterative of MapReduce jobs. Furthermore, map-phase join can save bandwidth by preventing join-less data to be transferred among computing nodes. We also design a storage schema and a join-order rule which enhance the performance of our system. The evaluation results show that our system outperforms traditional join approaches in most queries. Copyright © 2017 Inderscience Enterprises Ltd.
引用
收藏
页码:177 / 188
页数:11
相关论文
共 24 条
[1]  
Afrati F.N., Ullman J.D., Optimizing joins in a mapreduce environment, Proceedings of the 13th International Conference on Extending Database Technology, pp. 99-110, (2010)
[2]  
Barnaghi P., Presser M., Publishing linked sensor data, 3rd Interna-tional Workshop on Semantic Sensor Networks, CEUR-WS, (2010)
[3]  
Chang F., Dean J., Ghemawat S., Hsieh W.C., Wallach D.A., Burrows M., Et al., Bigtable: A distributed storage system for structured data, ACM Transactions on Computer Systems (TOCS), 26, 2, (2008)
[4]  
Choi H., Son J., Cho Y., Sung M.K., Chung Y.D., SPIDER: A system for scalable, parallel/distributed evaluation of large-scale RDF data, Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2087-2088, (2009)
[5]  
Dean J., Ghemawat S., MapReduce: Simplified data processing on large clusters, Communications of the ACM, 51, 1, pp. 107-113, (2008)
[6]  
Dean J., Ghemawat S., MapReduce: A flexible data processing tool, Communications of the ACM, 53, 1, pp. 72-77, (2010)
[7]  
Guo Y., Pan Z., Heflin J., LUBM: A benchmark for OWL knowledge base systems, Web Semantics: Science, Services and Agents on the World Wide Web, 3, 2, pp. 158-182, (2005)
[8]  
Harris S., Lamb N., Shadbolt N., 4store: The design and imple-mentation of a clustered RDF store, 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2009), pp. 94-109, (2009)
[9]  
Husain M., McGlothlin J., Masud M.M., Khan L., Thuraisingham B.M., Heuristics-based query processing for large RDF graphs using cloud computing, IEEE Transactions on Knowledge and Data Engineering, 23, 9, pp. 1312-1327, (2011)
[10]  
Husain M.F., Khan L., Kantarcioglu M., Thuraisingham B., Data intensive query processing for large RDF graphs using cloud computing tools, 2010 IEEE 3rd International Conference on Cloud Computing (CLOUD), pp. 1-10, (2010)